Hi Spark devs,

Want to give you guys a heads up that I'm working on a small (but major)
change with respect to how task dispatching works. Currently (as of Spark
1.0.1), Spark sends RDD object and closures using Akka along with the task
itself to the executors. This is however inefficient because all tasks in
the same stage use the same RDDs and closures, but we have to send these
closures and RDDs multiple times to the executors. This is especially bad
when some closure references some variable that is very large. The current
design led to users having to explicitly broadcast large variables.

The patch uses broadcast to send RDD objects and the closures to executors,
and use Akka to only send a reference to the broadcast RDD/closure along
with the partition specific information for the task. For those of you who
know more about the internals, Spark already relies on broadcast to send
the Hadoop JobConf every time it uses the Hadoop input, because the JobConf
is large.

The user-facing impact of the change include:

1. Users won't need to decide what to broadcast anymore
2. Task size will get smaller, resulting in faster scheduling and higher
task dispatch throughput.

In addition, the change will simplify some internals of Spark, removing the
need to maintain task caches and the complex logic to broadcast JobConf
(which also led to a deadlock recently).


Pull request attached: https://github.com/apache/spark/pull/1450

Reply via email to