Hey All,

A couple questions came up about shared variables recently, and I wanted to
confirm my understanding and update the doc to be a little more clear.

*Broadcast variables*
Now that tasks data is automatically broadcast, the only occasions where it
makes sense to explicitly broadcast are:
* You want to use a variable from tasks in multiple stages.
* You want to have the variable stored on the executors in deserialized
form.
* You want tasks to be able to modify the variable and have those
modifications take effect for other tasks running on the same executor
(usually a very bad idea).

Is that right?

*Accumulators*
Values are only counted for successful tasks.  Is that right?  KMeans seems
to use it in this way.  What happens if a node goes away and successful
tasks need to be resubmitted?  Or the stage runs again because a different
job needed it.

thanks,
Sandy

Reply via email to