Hi Josh, I mean on the driver side. OutputCommitCorrdinator.startStage is called in DAGScheduler#submitMissingTasks for all the stages (cost some memory). Although it is fine that as long as executor side don't call RPC, there's no much performance penalty.
On Wed, Aug 12, 2015 at 12:17 AM, Josh Rosen <rosenvi...@gmail.com> wrote: > Can you clarify what you mean by "used for all stages"? > OutputCommitCoordinator RPCs should only be initiated through > SparkHadoopMapRedUtil.commitTask(), so while the OutputCommitCoordinator > doesn't make a distinction between ShuffleMapStages and ResultStages there > still should not be a performance penalty for this because the extra rounds > of RPCs should only be performed when necessary. > > > On 8/11/15 2:25 AM, Jeff Zhang wrote: > >> As my understanding, OutputCommitCoordinator should only be necessary for >> ResultStage (especially for ResultStage with hdfs write), but currently it >> is used for all the stages. Is there any reason for that ? >> >> -- >> Best Regards >> >> Jeff Zhang >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > > -- Best Regards Jeff Zhang