pyspark/yarn and inconsistent number of executors

Calvin Tue, 19 Aug 2014 12:52:28 -0700

I've set up a YARN (Hadoop 2.4.1) cluster with Spark 1.0.1 and I've
been seeing some inconsistencies with out of memory errors
(java.lang.OutOfMemoryError: unable to create new native thread) when
increasing the number of executors for a simple job (wordcount).


The general format of my submission is:

spark-submit \
 --master yarn-client \
 --num-executors=$EXECUTORS \
 --executor-cores 1 \
 --executor-memory 2G \
 --driver-memory 3G \
 count.py intput output

If I run without specifying the number of executors, it defaults to
two (3 containers: 2 executors, 1 driver). Is there any mechanism to
let a spark application scale to the capacity of the YARN cluster
automatically?

Similarly, for low numbers of executors I get what I asked for (e.g.,
10 executors results in 11 containers running, 20 executors results in
21 containers, etc) until a particular threshold... when I specify 50
containers, Spark seems to start asking for more and more containers
until all the memory in the cluster is allocated and the job gets
killed.

I don't understand that particular behavior—if anyone has any
thoughts, that would be great if you could share your experiences.

Wouldn't it be preferable to have Spark stop requesting containers if
the cluster is at capacity rather than kill the job or error out?

Does anyone have any recommendations on how to tweak the number of
executors in an automated manner?

Thanks,
Calvin

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

pyspark/yarn and inconsistent number of executors

Reply via email to