Re: setuptools 78.0.0 does not work with pyspark 3.x releases

2025-03-24 Thread James Willis
Perhaps it is sufficient to wait for setuptools to revert the change: https://github.com/pypa/setuptools/pull/4911 On Mon, Mar 24, 2025 at 11:38 AM Holden Karau wrote: > I think given the lack of 4.0 release and the amount of folks using > PySpark this is enough to trigger a 3.5 branch release.

Re: [VOTE] [SPIP] SPARK-15689: Data Source API V2

2017-08-31 Thread James Baker
n would be appreciated by plenty I'm sure :) (and would make my implementation more straightforward - the state management is painful atm). James On Wed, 30 Aug 2017 at 14:56 Reynold Xin mailto:r...@databricks.com>> wrote: Sure that's good to do (and as discussed earlier a good co

Re: [VOTE] [SPIP] SPARK-15689: Data Source API V2

2017-08-31 Thread James Baker
n would be appreciated by plenty I'm sure :) (and would make my implementation more straightforward - the state management is painful atm). James On Wed, 30 Aug 2017 at 14:56 Reynold Xin mailto:r...@databricks.com>> wrote: Sure that's good to do (and as discussed earlier a good co

Re: [VOTE] [SPIP] SPARK-15689: Data Source API V2

2017-08-31 Thread James Baker
n would be appreciated by plenty I'm sure :) (and would make my implementation more straightforward - the state management is painful atm). James On Wed, 30 Aug 2017 at 14:56 Reynold Xin mailto:r...@databricks.com>> wrote: Sure that's good to do (and as discussed earlier a good co

Re: [VOTE] [SPIP] SPARK-15689: Data Source API V2

2017-08-31 Thread James Baker
n would be appreciated by plenty I'm sure :) (and would make my implementation more straightforward - the state management is painful atm). James On Wed, 30 Aug 2017 at 14:56 Reynold Xin mailto:r...@databricks.com>> wrote: Sure that's good to do (and as discussed earlier a good co

Re: [VOTE] [SPIP] SPARK-15689: Data Source API V2

2017-08-30 Thread James Baker
ge, like, who is the target consumer here? My personal slant is that it's more important to improve support for other datastores than it is to lower the barrier of entry - this is why I've been pushing here. James On Wed, 30 Aug 2017 at 09:37 Ryan Blue mailto:rb...@netflix.com>>

Re: [VOTE] [SPIP] SPARK-15689: Data Source API V2

2017-08-30 Thread James Baker
ge, like, who is the target consumer here? My personal slant is that it's more important to improve support for other datastores than it is to lower the barrier of entry - this is why I've been pushing here. James On Wed, 30 Aug 2017 at 09:37 Ryan Blue mailto:rb...@netflix.com>>

Re: [VOTE] [SPIP] SPARK-15689: Data Source API V2

2017-08-30 Thread James Baker
ge, like, who is the target consumer here? My personal slant is that it's more important to improve support for other datastores than it is to lower the barrier of entry - this is why I've been pushing here. James On Wed, 30 Aug 2017 at 09:37 Ryan Blue mailto:rb...@netflix.com>>

Re: [VOTE] [SPIP] SPARK-15689: Data Source API V2

2017-08-30 Thread James Baker
ge, like, who is the target consumer here? My personal slant is that it's more important to improve support for other datastores than it is to lower the barrier of entry - this is why I've been pushing here. James On Wed, 30 Aug 2017 at 09:37 Ryan Blue mailto:rb...@netflix.com>>

Re: [VOTE] [SPIP] SPARK-15689: Data Source API V2

2017-08-29 Thread James Baker
o include some kind of layering here. I could probably sketch out something here if that'd be useful? James On Tue, 29 Aug 2017 at 18:59 Wenchen Fan mailto:cloud0...@gmail.com>> wrote: Hi James, Thanks for your feedback! I think your concerns are all valid, but we need to make a trad

Re: [VOTE] [SPIP] SPARK-15689: Data Source API V2

2017-08-29 Thread James Baker
o include some kind of layering here. I could probably sketch out something here if that'd be useful? James On Tue, 29 Aug 2017 at 18:59 Wenchen Fan mailto:cloud0...@gmail.com>> wrote: Hi James, Thanks for your feedback! I think your concerns are all valid, but we need to make a trad

Re: [VOTE] [SPIP] SPARK-15689: Data Source API V2

2017-08-29 Thread James Baker
o include some kind of layering here. I could probably sketch out something here if that'd be useful? James On Tue, 29 Aug 2017 at 18:59 Wenchen Fan mailto:cloud0...@gmail.com>> wrote: Hi James, Thanks for your feedback! I think your concerns are all valid, but we need to make a trad

Re: [VOTE] [SPIP] SPARK-15689: Data Source API V2

2017-08-29 Thread James Baker
deally this contract could be implied by the way the Java class structure works, but otherwise I can just throw). James On Tue, 29 Aug 2017 at 02:56 Reynold Xin mailto:r...@databricks.com>> wrote: James, Thanks for the comment. I think you just pointed out a trade-off between expressiveness

Re: [VOTE] [SPIP] SPARK-15689: Data Source API V2

2017-08-29 Thread James Baker
deally this contract could be implied by the way the Java class structure works, but otherwise I can just throw). James On Tue, 29 Aug 2017 at 02:56 Reynold Xin mailto:r...@databricks.com>> wrote: James, Thanks for the comment. I think you just pointed out a trade-off between expressiveness

Re: [VOTE] [SPIP] SPARK-15689: Data Source API V2

2017-08-28 Thread James Baker
s our supported pushdown stuff, and then the user can transform and return it. I think this ends up being a more elegant API for consumers, and also far more intuitive. James On Mon, 28 Aug 2017 at 18:00 蒋星博 mailto:jiangxb1...@gmail.com>> wrote: +1 (Non-binding) Xiao Li mailto:gatorsm...@g

Re: [VOTE] [SPIP] SPARK-15689: Data Source API V2

2017-08-28 Thread James Baker
s our supported pushdown stuff, and then the user can transform and return it. I think this ends up being a more elegant API for consumers, and also far more intuitive. James On Mon, 28 Aug 2017 at 18:00 蒋星博 mailto:jiangxb1...@gmail.com>> wrote: +1 (Non-binding) Xiao Li mailto:gatorsm...@g

Re: [VOTE] [SPIP] SPARK-15689: Data Source API V2

2017-08-28 Thread James Baker
s our supported pushdown stuff, and then the user can transform and return it. I think this ends up being a more elegant API for consumers, and also far more intuitive. James On Mon, 28 Aug 2017 at 18:00 蒋星博 mailto:jiangxb1...@gmail.com>> wrote: +1 (Non-binding) Xiao Li mailto:gatorsm...@g

RE: [VOTE] Release Apache Spark 2.0.0 (RC4)

2016-07-15 Thread james
-1 This bug SPARK-16515 in Spark 2.0 breaks our cases which can run on 1.6. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-2-0-0-RC4-tp18317p18341.html Sent from the Apache Spark Developers List mailing list archive at Nabbl

How Spark SQL correctly connect hive metastore database with Spark 2.0 ?

2016-05-12 Thread james
Hi Spark guys, I am try to run Spark SQL using bin/spark-sql with Spark 2.0 master code(commit ba181c0c7a32b0e81bbcdbe5eed94fc97b58c83e) but ran across an issue that it always connect local derby database and can't connect my existing hive metastore database. Could you help me to check what's the r

Re: dataframe udf functioin will be executed twice when filter on new column created by withColumn

2016-05-11 Thread James Hammerton
This may be related to: https://issues.apache.org/jira/browse/SPARK-13773 Regards, James On 11 May 2016 at 15:49, Ted Yu wrote: > In master branch, behavior is the same. > > Suggest opening a JIRA if you haven't done so. > > On Wed, May 11, 2016 at 6:55 AM, Tony Jin

Re: java.lang.OutOfMemoryError: Unable to acquire bytes of memory

2016-03-22 Thread james
I guess different workload cause diff result ? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-OutOfMemoryError-Unable-to-acquire-bytes-of-memory-tp16773p16789.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com

Re: java.lang.OutOfMemoryError: Unable to acquire bytes of memory

2016-03-22 Thread james
Hi, I also found 'Unable to acquire memory' issue using Spark 1.6.1 with Dynamic allocation on YARN. My case happened with setting spark.sql.shuffle.partitions larger than 200. From error stack, it has a diff with issue reported by Nezih and not sure if these has same root cause. Tha

Re: ORC file writing hangs in pyspark

2016-02-24 Thread James Barney
es quickly. Thank you again for the suggestions On Tue, Feb 23, 2016 at 9:28 PM, Zhan Zhang wrote: > Hi James, > > You can try to write with other format, e.g., parquet to see whether it is > a orc specific issue or more generic issue. > > Thanks. > > Zhan Zhang > > O

ORC file writing hangs in pyspark

2016-02-23 Thread James Barney
I'm trying to write an ORC file after running the FPGrowth algorithm on a dataset of around just 2GB in size. The algorithm performs well and can display results if I take(n) the freqItemSets() of the result after converting that to a DF. I'm using Spark 1.5.2 on HDP 2.3.4 and Python 3.4.2 on Yarn

Re: [VOTE] Release Apache Spark 1.5.1 (RC1)

2015-09-28 Thread james
+1 1) Build binary instruction: ./make-distribution.sh --tgz --skip-java-test -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver -DskipTests 2) Run Spark SQL with YARN client mode This 1.5.1 RC1 package have better test results than previous 1.5.0 except for Spark-10484,Spark-

Re: [VOTE] Release Apache Spark 1.5.0 (RC3)

2015-09-07 Thread james
add a critical bug https://issues.apache.org/jira/browse/SPARK-10474 (Aggregation failed with unable to acquire memory) -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-5-0-RC3-tp13928p13987.html Sent from the Apache Spark De

Re: [VOTE] Release Apache Spark 1.5.0 (RC3)

2015-09-06 Thread james
I saw a new "spark.shuffle.manager=tungsten-sort" implemented in https://issues.apache.org/jira/browse/SPARK-7081, but it can't be found its corresponding description in http://people.apache.org/~pwendell/spark-releases/spark-1.5.0-rc3-docs/configuration.html(Currenlty there are only 'sort' and 'ha

Re: Came across Spark SQL hang/Error issue with Spark 1.5 Tungsten feature

2015-08-03 Thread james
Based on the latest spark code(commit 608353c8e8e50461fafff91a2c885dca8af3aaa8) and used the same Spark SQL query to test two group of combined configuration and seemed that currently it don't work fine in "tungsten-sort" shuffle manager from below results: *Test 1# (PASSED)* spark.shuffle.manager

Re: Came across Spark SQL hang/Error issue with Spark 1.5 Tungsten feature

2015-08-02 Thread james
Thank you for your reply! Do you mean that currently if i want to use this Tungsten feature, we had to set sort shuffle manager(spark.shuffle.manager=sort) ,right ? However, I saw a slide "Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal" published in Spark Summit 2015 and it s

Re: Came across Spark SQL hang/Error issue with Spark 1.5 Tungsten feature

2015-07-31 Thread james
Another error: 15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 3 to bignode1:40443 15/07/31 16:15:28 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 3 is 583 bytes 15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterE

Came across Spark SQL hang issue with Spark 1.5 Tungsten feature

2015-07-31 Thread james
I try to enable Tungsten with Spark SQL and set below 3 parameters, but i found the Spark SQL always hang below point. So could you please point me what's the potential cause ? I'd appreciate any input. spark.shuffle.manager=tungsten-sort spark.sql.codegen=true spark.sql.unsafe.enabled=true 15/07/

graph.mapVertices() function obtain edge triplets with null attribute

2015-02-26 Thread James
My code ``` // Initial the graph, assign a counter to each vertex that contains the vertex id only var anfGraph = graph.mapVertices { case (vid, _) => val counter = new HyperLogLog(5) counter.offer(vid) counter } val nullVertex = anfGraph.triplets.filter(edge => edge.srcAttr == null).first

Re: Why a program would receive null from send message of mapReduceTriplets

2015-02-13 Thread James
essage) // <- NullPointerException ``` I could found that some vertex attributes in some triplets are null, but not all. Alcaid 2015-02-13 14:50 GMT+08:00 Reynold Xin : > Then maybe you actually had a null in your vertex attribute? > > > On Thu, Feb 12, 2015 at 10:47 PM, James wrot

Re: Why a program would receive null from send message of mapReduceTriplets

2015-02-12 Thread James
Thu, Feb 12, 2015 at 10:47 PM, James wrote: > >> I changed the mapReduceTriplets() func to aggregateMessages(), but it >> still failed. >> >> >> 2015-02-13 6:52 GMT+08:00 Reynold Xin : >> >>> Can you use the new aggregateNeighbors method? I suspect the

Re: Why a program would receive null from send message of mapReduceTriplets

2015-02-12 Thread James
you need the src or dst vertex data. Occasionally it can fail to detect. In > the new aggregateNeighbors API, the caller needs to explicitly specifying > that, making it more robust. > > > On Thu, Feb 12, 2015 at 6:26 AM, James wrote: > >> Hello, >> >> When I

Why a program would receive null from send message of mapReduceTriplets

2015-02-12 Thread James
appreciated. Alcaid 2015-02-11 19:30 GMT+08:00 James : > Hello, > > Recently I am trying to estimate the average distance of a big graph > using spark with the help of [HyperAnf]( > http://dl.acm.org/citation.cfm?id=1963493). > > It works like Connect Componenet algorithm, whi

[GraphX] Estimating Average distance of a big graph using GraphX

2015-02-11 Thread James
Hello, Recently I am trying to estimate the average distance of a big graph using spark with the help of [HyperAnf](http://dl.acm.org/citation.cfm?id=1963493 ). It works like Connect Componenet algorithm, while the attribute of a vertex is a HyperLogLog counter that at k-th iteration it estimate

Re: not found: type LocalSparkContext

2015-01-20 Thread James
;s declared here: > > > https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/LocalSparkContext.scala > > I assume you're already importing LocalSparkContext, but since the test > classes aren't included in Spark packages, you'll

not found: type LocalSparkContext

2015-01-20 Thread James
Hi all, When I was trying to write a test on my spark application I met ``` Error:(14, 43) not found: type LocalSparkContext class HyperANFSuite extends FunSuite with LocalSparkContext { ``` At the source code of spark-core I could not found "LocalSparkContext", thus I wonder how to write a test

Using graphx to calculate average distance of a big graph

2015-01-04 Thread James
Recently we want to use spark to calculate the average shortest path distance between each reachable pair of nodes in a very big graph. Is there any one ever try this? We hope to discuss about the problem.

Re: will/when Spark/SparkSQL will support ORCFile format

2014-10-09 Thread James Yu
e will be the same > as that for datasources included in the core spark sql library. > > Michael > > On Thu, Oct 9, 2014 at 2:18 PM, James Yu wrote: > >> For performance, will foreign data format support, same as native ones? >> >> Thanks, >> James &

Re: will/when Spark/SparkSQL will support ORCFile format

2014-10-09 Thread James Yu
For performance, will foreign data format support, same as native ones? Thanks, James On Wed, Oct 8, 2014 at 11:03 PM, Cheng Lian wrote: > The foreign data source API PR also matters here > https://www.github.com/apache/spark/pull/2475 > > Foreign data source like ORC can be added

Re: will/when Spark/SparkSQL will support ORCFile format

2014-10-08 Thread James Yu
Thanks Mark! I will keep eye on it. @Evan, I saw people use both format, so I really want to have Spark support ORCFile. On Wed, Oct 8, 2014 at 11:12 AM, Mark Hamstra wrote: > https://github.com/apache/spark/pull/2576 > > > > On Wed, Oct 8, 2014 at 11:01 AM, Evan Chan >

will/when Spark/SparkSQL will support ORCFile format

2014-10-08 Thread James Yu
Didn't see anyone asked the question before, but I was wondering if anyone knows if Spark/SparkSQL will support ORCFile format soon? ORCFile is getting more and more popular hi Hive world. Thanks, James