A couple questions about shared variables

Sandy Ryza Sat, 20 Sep 2014 08:51:12 -0700

Hey All,

A couple questions came up about shared variables recently, and I wanted to
confirm my understanding and update the doc to be a little more clear.


*Broadcast variables*
Now that tasks data is automatically broadcast, the only occasions where it
makes sense to explicitly broadcast are:
* You want to use a variable from tasks in multiple stages.
* You want to have the variable stored on the executors in deserialized
form.
* You want tasks to be able to modify the variable and have those
modifications take effect for other tasks running on the same executor
(usually a very bad idea).

Is that right?

*Accumulators*
Values are only counted for successful tasks.  Is that right?  KMeans seems
to use it in this way.  What happens if a node goes away and successful
tasks need to be resubmitted?  Or the stage runs again because a different
job needed it.

thanks,
Sandy

A couple questions about shared variables

Reply via email to