Perhaps it is sufficient to wait for setuptools to revert the change:
https://github.com/pypa/setuptools/pull/4911
On Mon, Mar 24, 2025 at 11:38 AM Holden Karau
wrote:
> I think given the lack of 4.0 release and the amount of folks using
> PySpark this is enough to trigger a 3.5 branch release.
n would be appreciated by
plenty I'm sure :) (and would make my implementation more straightforward - the
state management is painful atm).
James
On Wed, 30 Aug 2017 at 14:56 Reynold Xin
mailto:r...@databricks.com>> wrote:
Sure that's good to do (and as discussed earlier a good co
n would be appreciated by
plenty I'm sure :) (and would make my implementation more straightforward - the
state management is painful atm).
James
On Wed, 30 Aug 2017 at 14:56 Reynold Xin
mailto:r...@databricks.com>> wrote:
Sure that's good to do (and as discussed earlier a good co
n would be appreciated by
plenty I'm sure :) (and would make my implementation more straightforward - the
state management is painful atm).
James
On Wed, 30 Aug 2017 at 14:56 Reynold Xin
mailto:r...@databricks.com>> wrote:
Sure that's good to do (and as discussed earlier a good co
n would be appreciated by
plenty I'm sure :) (and would make my implementation more straightforward - the
state management is painful atm).
James
On Wed, 30 Aug 2017 at 14:56 Reynold Xin
mailto:r...@databricks.com>> wrote:
Sure that's good to do (and as discussed earlier a good co
ge, like, who is the target consumer here? My personal
slant is that it's more important to improve support for other datastores than
it is to lower the barrier of entry - this is why I've been pushing here.
James
On Wed, 30 Aug 2017 at 09:37 Ryan Blue
mailto:rb...@netflix.com>>
ge, like, who is the target consumer here? My personal
slant is that it's more important to improve support for other datastores than
it is to lower the barrier of entry - this is why I've been pushing here.
James
On Wed, 30 Aug 2017 at 09:37 Ryan Blue
mailto:rb...@netflix.com>>
ge, like, who is the target consumer here? My personal
slant is that it's more important to improve support for other datastores than
it is to lower the barrier of entry - this is why I've been pushing here.
James
On Wed, 30 Aug 2017 at 09:37 Ryan Blue
mailto:rb...@netflix.com>>
ge, like, who is the target consumer here? My personal
slant is that it's more important to improve support for other datastores than
it is to lower the barrier of entry - this is why I've been pushing here.
James
On Wed, 30 Aug 2017 at 09:37 Ryan Blue
mailto:rb...@netflix.com>>
o
include some kind of layering here. I could probably sketch out something here
if that'd be useful?
James
On Tue, 29 Aug 2017 at 18:59 Wenchen Fan
mailto:cloud0...@gmail.com>> wrote:
Hi James,
Thanks for your feedback! I think your concerns are all valid, but we need to
make a trad
o
include some kind of layering here. I could probably sketch out something here
if that'd be useful?
James
On Tue, 29 Aug 2017 at 18:59 Wenchen Fan
mailto:cloud0...@gmail.com>> wrote:
Hi James,
Thanks for your feedback! I think your concerns are all valid, but we need to
make a trad
o
include some kind of layering here. I could probably sketch out something here
if that'd be useful?
James
On Tue, 29 Aug 2017 at 18:59 Wenchen Fan
mailto:cloud0...@gmail.com>> wrote:
Hi James,
Thanks for your feedback! I think your concerns are all valid, but we need to
make a trad
deally this contract could be implied by
the way the Java class structure works, but otherwise I can just throw).
James
On Tue, 29 Aug 2017 at 02:56 Reynold Xin
mailto:r...@databricks.com>> wrote:
James,
Thanks for the comment. I think you just pointed out a trade-off between
expressiveness
deally this contract could be implied by
the way the Java class structure works, but otherwise I can just throw).
James
On Tue, 29 Aug 2017 at 02:56 Reynold Xin
mailto:r...@databricks.com>> wrote:
James,
Thanks for the comment. I think you just pointed out a trade-off between
expressiveness
s our supported pushdown stuff, and then the user can
transform and return it.
I think this ends up being a more elegant API for consumers, and also far more
intuitive.
James
On Mon, 28 Aug 2017 at 18:00 蒋星博
mailto:jiangxb1...@gmail.com>> wrote:
+1 (Non-binding)
Xiao Li mailto:gatorsm...@g
s our supported pushdown stuff, and then the user can
transform and return it.
I think this ends up being a more elegant API for consumers, and also far more
intuitive.
James
On Mon, 28 Aug 2017 at 18:00 蒋星博
mailto:jiangxb1...@gmail.com>> wrote:
+1 (Non-binding)
Xiao Li mailto:gatorsm...@g
s our supported pushdown stuff, and then the user can
transform and return it.
I think this ends up being a more elegant API for consumers, and also far more
intuitive.
James
On Mon, 28 Aug 2017 at 18:00 蒋星博
mailto:jiangxb1...@gmail.com>> wrote:
+1 (Non-binding)
Xiao Li mailto:gatorsm...@g
-1
This bug SPARK-16515 in Spark 2.0 breaks our cases which can run on 1.6.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-2-0-0-RC4-tp18317p18341.html
Sent from the Apache Spark Developers List mailing list archive at Nabbl
Hi Spark guys,
I am try to run Spark SQL using bin/spark-sql with Spark 2.0 master
code(commit ba181c0c7a32b0e81bbcdbe5eed94fc97b58c83e) but ran across an
issue that it always connect local derby database and can't connect my
existing hive metastore database. Could you help me to check what's the r
This may be related to: https://issues.apache.org/jira/browse/SPARK-13773
Regards,
James
On 11 May 2016 at 15:49, Ted Yu wrote:
> In master branch, behavior is the same.
>
> Suggest opening a JIRA if you haven't done so.
>
> On Wed, May 11, 2016 at 6:55 AM, Tony Jin
I guess different workload cause diff result ?
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-OutOfMemoryError-Unable-to-acquire-bytes-of-memory-tp16773p16789.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com
Hi,
I also found 'Unable to acquire memory' issue using Spark 1.6.1 with Dynamic
allocation on YARN. My case happened with setting
spark.sql.shuffle.partitions larger than 200. From error stack, it has a
diff with issue reported by Nezih and not sure if these has same root cause.
Tha
es quickly.
Thank you again for the suggestions
On Tue, Feb 23, 2016 at 9:28 PM, Zhan Zhang wrote:
> Hi James,
>
> You can try to write with other format, e.g., parquet to see whether it is
> a orc specific issue or more generic issue.
>
> Thanks.
>
> Zhan Zhang
>
> O
I'm trying to write an ORC file after running the FPGrowth algorithm on a
dataset of around just 2GB in size. The algorithm performs well and can
display results if I take(n) the freqItemSets() of the result after
converting that to a DF.
I'm using Spark 1.5.2 on HDP 2.3.4 and Python 3.4.2 on Yarn
+1
1) Build binary instruction: ./make-distribution.sh --tgz --skip-java-test
-Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver
-DskipTests
2) Run Spark SQL with YARN client mode
This 1.5.1 RC1 package have better test results than previous 1.5.0 except
for Spark-10484,Spark-
add a critical bug https://issues.apache.org/jira/browse/SPARK-10474
(Aggregation failed with unable to acquire memory)
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-5-0-RC3-tp13928p13987.html
Sent from the Apache Spark De
I saw a new "spark.shuffle.manager=tungsten-sort" implemented in
https://issues.apache.org/jira/browse/SPARK-7081, but it can't be found its
corresponding description in
http://people.apache.org/~pwendell/spark-releases/spark-1.5.0-rc3-docs/configuration.html(Currenlty
there are only 'sort' and 'ha
Based on the latest spark code(commit
608353c8e8e50461fafff91a2c885dca8af3aaa8) and used the same Spark SQL query
to test two group of combined configuration and seemed that currently it
don't work fine in "tungsten-sort" shuffle manager from below results:
*Test 1# (PASSED)*
spark.shuffle.manager
Thank you for your reply!
Do you mean that currently if i want to use this Tungsten feature, we had to
set sort shuffle manager(spark.shuffle.manager=sort) ,right ? However, I
saw a slide "Deep Dive into Project Tungsten: Bringing Spark Closer to Bare
Metal" published in Spark Summit 2015 and it s
Another error:
15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send
map output locations for shuffle 3 to bignode1:40443
15/07/31 16:15:28 INFO spark.MapOutputTrackerMaster: Size of output statuses
for shuffle 3 is 583 bytes
15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterE
I try to enable Tungsten with Spark SQL and set below 3 parameters, but i
found the Spark SQL always hang below point. So could you please point me
what's the potential cause ? I'd appreciate any input.
spark.shuffle.manager=tungsten-sort
spark.sql.codegen=true
spark.sql.unsafe.enabled=true
15/07/
My code
```
// Initial the graph, assign a counter to each vertex that contains the
vertex id only
var anfGraph = graph.mapVertices { case (vid, _) =>
val counter = new HyperLogLog(5)
counter.offer(vid)
counter
}
val nullVertex = anfGraph.triplets.filter(edge => edge.srcAttr ==
null).first
essage) // <-
NullPointerException
```
I could found that some vertex attributes in some triplets are null, but
not all.
Alcaid
2015-02-13 14:50 GMT+08:00 Reynold Xin :
> Then maybe you actually had a null in your vertex attribute?
>
>
> On Thu, Feb 12, 2015 at 10:47 PM, James wrot
Thu, Feb 12, 2015 at 10:47 PM, James wrote:
>
>> I changed the mapReduceTriplets() func to aggregateMessages(), but it
>> still failed.
>>
>>
>> 2015-02-13 6:52 GMT+08:00 Reynold Xin :
>>
>>> Can you use the new aggregateNeighbors method? I suspect the
you need the src or dst vertex data. Occasionally it can fail to detect. In
> the new aggregateNeighbors API, the caller needs to explicitly specifying
> that, making it more robust.
>
>
> On Thu, Feb 12, 2015 at 6:26 AM, James wrote:
>
>> Hello,
>>
>> When I
appreciated.
Alcaid
2015-02-11 19:30 GMT+08:00 James :
> Hello,
>
> Recently I am trying to estimate the average distance of a big graph
> using spark with the help of [HyperAnf](
> http://dl.acm.org/citation.cfm?id=1963493).
>
> It works like Connect Componenet algorithm, whi
Hello,
Recently I am trying to estimate the average distance of a big graph using
spark with the help of [HyperAnf](http://dl.acm.org/citation.cfm?id=1963493
).
It works like Connect Componenet algorithm, while the attribute of a vertex
is a HyperLogLog counter that at k-th iteration it estimate
;s declared here:
>
>
> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/LocalSparkContext.scala
>
> I assume you're already importing LocalSparkContext, but since the test
> classes aren't included in Spark packages, you'll
Hi all,
When I was trying to write a test on my spark application I met
```
Error:(14, 43) not found: type LocalSparkContext
class HyperANFSuite extends FunSuite with LocalSparkContext {
```
At the source code of spark-core I could not found "LocalSparkContext",
thus I wonder how to write a test
Recently we want to use spark to calculate the average shortest path
distance between each reachable pair of nodes in a very big graph.
Is there any one ever try this? We hope to discuss about the problem.
e will be the same
> as that for datasources included in the core spark sql library.
>
> Michael
>
> On Thu, Oct 9, 2014 at 2:18 PM, James Yu wrote:
>
>> For performance, will foreign data format support, same as native ones?
>>
>> Thanks,
>> James
&
For performance, will foreign data format support, same as native ones?
Thanks,
James
On Wed, Oct 8, 2014 at 11:03 PM, Cheng Lian wrote:
> The foreign data source API PR also matters here
> https://www.github.com/apache/spark/pull/2475
>
> Foreign data source like ORC can be added
Thanks Mark! I will keep eye on it.
@Evan, I saw people use both format, so I really want to have Spark support
ORCFile.
On Wed, Oct 8, 2014 at 11:12 AM, Mark Hamstra
wrote:
> https://github.com/apache/spark/pull/2576
>
>
>
> On Wed, Oct 8, 2014 at 11:01 AM, Evan Chan
>
Didn't see anyone asked the question before, but I was wondering if anyone
knows if Spark/SparkSQL will support ORCFile format soon? ORCFile is
getting more and more popular hi Hive world.
Thanks,
James
44 matches
Mail list logo