subject:"Re\: Pyspark Memory Woes"

Re: Pyspark Memory Woes

2014-03-12 Thread Aaron Olson

Hi Sandy, We are, yes. I strongly suspect we're not partitioning our data properly, but maybe 1.5G is simply too small for our workload. I'll bump the executor memory and see if we get better results. It seems we should be setting it to (SPARK_WORKER_MEMORY + pyspark memory) / # of concurrent app

Re: Pyspark Memory Woes

2014-03-11 Thread Sandy Ryza

Are you aware that you get an executor (and the 1.5GB) per machine, not per core? On Tue, Mar 11, 2014 at 12:52 PM, Aaron Olson wrote: > Hi Sandy, > > We're configuring that with the JAVA_OPTS environment variable in > $SPARK_HOME/spark-worker-env.sh like this: > > # JAVA OPTS > export SPARK_JA

Re: Pyspark Memory Woes

2014-03-11 Thread Aaron Olson

Hi Sandy, We're configuring that with the JAVA_OPTS environment variable in $SPARK_HOME/spark-worker-env.sh like this: # JAVA OPTS export SPARK_JAVA_OPTS="-Dspark.ui.port=0 -Dspark.default.parallelism=1024 -Dspark.cores.max=256 -Dspark.executor.memory=1500m -Dspark.worker.timeout=500 -Dspark.akka

Re: Pyspark Memory Woes

2014-03-11 Thread Sandy Ryza

Hi Aaron, When you say "Java heap space is 1.5G per worker, 24 or 32 cores across 46 nodes. It seems like we should have more than enough to do this comfortably.", how are you configuring this? -Sandy On Tue, Mar 11, 2014 at 10:11 AM, Aaron Olson wrote: > Dear Sparkians, > > We are working on