Spark w/YARN Scheduling Questions...

Robert Saccone Thu, 17 Sep 2015 15:32:34 -0700

Hello


We're running some experiments with Spark (v1.4) and have some questions
about its scheduling behavior.  I am hoping someone can answer the
following questions.


What is a task set?  It is mentioned in the Spark logs we get from our runs
but we can't seem to find a definition and how it relates to the Spark
concepts of Jobs, Stages, and Tasks in the online documentation.  This
makes it hard to reason about the scheduling behavior.


What is the heuristic used to kill executors when running Spark with YARN
in dynamic mode?  From the logs what we observe is that executors that have
work (task sets) queued to them are being killed and the work (task sets)
are being reassigned to other executors.  This seems inconsistent with the
online documentation which says that executors aren't killed until they've
been idle for a user configurable number of seconds.


We're using the Fair scheduler pooling with multiple pools each with
different weights, so is it correct that there are queues in the pools and
in the executors as well?


We can provide more details on our setup if desired.


Regards,

Rob Saccone

IBM T. J. Watson Center

Spark w/YARN Scheduling Questions...

Reply via email to