On 08/29/2014 06:05 PM, Nick Chammas wrote:
Here’s a repro for PySpark:
|a = sc.parallelize(["Nick","John","Bob"])
a = a.repartition(24000)
a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y).take(1)
|
When I try this on an EC2 cluster with 1.1.0-rc2 and Python 2.7, this is
what I get:
nWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Is this a bug? What’s going on here?
Nick
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-large-of-partitions-causes-OOM-tp13155.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.