Re: spark.akka.frameSize stalls job in 1.1.0

2014-08-18 Thread jerryye
I've been trying different approaches of this: populating the trie on the driver and serializing the instance to executors, broadcasting the strings in an array and populating the trie on the executors, and variants of what I'm broadcasting or serializing. All approaches seem to have a memory is

Re: spark.akka.frameSize stalls job in 1.1.0

2014-08-18 Thread Zhan Zhang
Not sure exactly how you use it. My understanding is that in spark it would be better to keep the overhead of driver as less as possible. Is it possible to broadcast trie to executors, do computation there and then aggregate the counters (??) in reduct phase? Thanks. Zhan Zhang On Aug 18, 201

Re: spark.akka.frameSize stalls job in 1.1.0

2014-08-18 Thread Jerry Ye
Hi Zhan, Thanks for looking into this. I'm actually using the hash map as an example of the simplest snippet of code that is failing for me. I know that this is just the word count. In my actual problem I'm using a Trie data structure to find substring matches. On Sun, Aug 17, 2014 at 11:35 PM, Z

Re: spark.akka.frameSize stalls job in 1.1.0

2014-08-17 Thread Zhan Zhang
Is it because countByValue or toArray put too much stress on the driver, if there are many unique words To me it is a typical word count problem, then you can solve it as follows (correct me if I am wrong) val textFile = sc.textFile(“file") val counts = textFile.flatMap(line => line.split(" "))

Re: spark.akka.frameSize stalls job in 1.1.0

2014-08-17 Thread Jerry Ye
The job ended up running overnight with no progress. :-( On Sat, Aug 16, 2014 at 12:16 AM, Jerry Ye wrote: > Hi Xiangrui, > I actually tried branch-1.1 and master and it resulted in the job being > stuck at the TaskSetManager: > 14/08/16 06:55:48 INFO scheduler.TaskSchedulerImpl: Adding task se

Re: spark.akka.frameSize stalls job in 1.1.0

2014-08-16 Thread Jerry Ye
Hi Xiangrui, I actually tried branch-1.1 and master and it resulted in the job being stuck at the TaskSetManager: 14/08/16 06:55:48 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 2 tasks 14/08/16 06:55:48 INFO scheduler.TaskSetManager: Starting task 1.0:0 as TID 2 on executor 8: ip-10-2

Re: spark.akka.frameSize stalls job in 1.1.0

2014-08-15 Thread Xiangrui Meng
Just saw you used toArray on an RDD. That copies all data to the driver and it is deprecated. countByValue is what you need: val samples = sc.textFile("s3n://geonames") val counts = samples.countByValue() val result = samples.map(l => (l, counts.getOrElse(l, 0L)) Could you also try to use the lat

Re: spark.akka.frameSize stalls job in 1.1.0

2014-08-15 Thread jerryye
Hi Xiangrui, You were right, I had to use --driver_memory instead of setting it in spark-defaults.conf. However, now my just hangs with the following message: 4/08/15 23:54:46 INFO scheduler.TaskSetManager: Serialized task 1.0:0 as 29433434 bytes in 202 ms 14/08/15 23:54:46 INFO scheduler.TaskSetM

Re: spark.akka.frameSize stalls job in 1.1.0

2014-08-15 Thread Xiangrui Meng
Did you verify the driver memory in the Executor tab of the WebUI? I think you need `--driver-memory 8g` with spark-shell or spark-submit instead of setting it in spark-defaults.conf. On Fri, Aug 15, 2014 at 12:41 PM, jerryye wrote: > Setting spark.driver.memory has no effect. It's still hanging

Re: spark.akka.frameSize stalls job in 1.1.0

2014-08-15 Thread jerryye
Setting spark.driver.memory has no effect. It's still hanging trying to compute result.count when I'm sampling greater than 35% regardless of what value of spark.driver.memory I'm setting. Here's my settings: export SPARK_JAVA_OPTS="-Xms5g -Xmx10g -XX:MaxPermSize=10g" export SPARK_MEM=10g in conf

Re: spark.akka.frameSize stalls job in 1.1.0

2014-08-15 Thread jerryye
Hi Xiangrui, I wasn't setting spark.driver.memory. I'll try that and report back. In terms of this running on the cluster, my assumption was that calling foreach on an array(I converted samples using toArray) would mean counts is propagated locally. The object would then be serialized to executo

Re: spark.akka.frameSize stalls job in 1.1.0

2014-08-15 Thread Xiangrui Meng
Did you set driver memory? You can confirm it in the Executors tab of the WebUI. Btw, the code may only work in local mode. In a cluster mode, counts will be serialized to remote workers and the result is not fetched by the driver after foreach. You can use RDD.countByValue instead. -Xiangrui On F