spark.akka.frameSize stalls job in 1.1.0

2014-08-15 Thread jerryye
Hi All, I'm not sure if I should file a JIRA or if I'm missing something obvious since the test code I'm trying is so simple. I've isolated the problem I'm seeing to a memory issue but I don't know what parameter I need to tweak, it does seem related to spark.akka.frameSize. If I sample my RDD with

Re: spark.akka.frameSize stalls job in 1.1.0

2014-08-15 Thread jerryye
tched by the driver after foreach. You can use RDD.countByValue > instead. -Xiangrui > > On Fri, Aug 15, 2014 at 8:18 AM, jerryye <[hidden email]> wrote: > > > Hi All, > > I'm not sure if I should file a JIRA or if I'm missing something obvious > > since the te

Re: spark.akka.frameSize stalls job in 1.1.0

2014-08-15 Thread jerryye
Setting spark.driver.memory has no effect. It's still hanging trying to compute result.count when I'm sampling greater than 35% regardless of what value of spark.driver.memory I'm setting. Here's my settings: export SPARK_JAVA_OPTS="-Xms5g -Xmx10g -XX:MaxPermSize=10g" export SPARK_MEM=10g in conf

Re: spark.akka.frameSize stalls job in 1.1.0

2014-08-15 Thread jerryye
Apache Spark Developers List] wrote: > Did you verify the driver memory in the Executor tab of the WebUI? I > think you need `--driver-memory 8g` with spark-shell or spark-submit > instead of setting it in spark-defaults.conf. > > On Fri, Aug 15, 2014 at 12:41 PM, jerryye <[hid

Re: spark.akka.frameSize stalls job in 1.1.0

2014-08-18 Thread jerryye
I've been trying different approaches of this: populating the trie on the driver and serializing the instance to executors, broadcasting the strings in an array and populating the trie on the executors, and variants of what I'm broadcasting or serializing. All approaches seem to have a memory is

saveAsTextFile makes no progress without caching RDD

2014-08-21 Thread jerryye
Hi, Cross-posting this from users list. I'm running on branch-1.1 and trying to do a simple transformation to a relatively small dataset of 64GB and saveAsTextFile essentially hangs and tasks are stuck in running mode with the following code: // Stalls with tasks running for over an hour with n

Re: saveAsTextFile to s3 on spark does not work, just hangs

2014-08-21 Thread jerryye
bump. I'm seeing the same issue with branch-1.1. Caching the RDD before running saveAsTextFile gets things running but the job stalls 2/3 of the way by using too much memory. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/saveAsTextFile-to-s3-on-spar

Re: saveAsTextFile to s3 on spark does not work, just hangs

2014-08-25 Thread jerryye
).saveAsTextFile(...)? > > Matei > > On August 25, 2014 at 12:09:25 PM, amnonkhen ([hidden email] > <http://user/SendEmail.jtp?type=node&node=8000&i=0>) wrote: > > Hi jerryye, > Maybe if you voted up my question on Stack Overflow it would get some > traction a

Re: saveAsTextFile to s3 on spark does not work, just hangs

2014-08-25 Thread jerryye
> sc.parallelize(1 to 100*1000*1000, 20).saveAsTextFile(...)? > > > > Matei > > > > On August 25, 2014 at 12:09:25 PM, amnonkhen ([hidden email] > <http://user/SendEmail.jtp?type=node&node=8001&i=1>) wrote: > > > > Hi jerryye, > > Maybe if