Right, there're only 2 kinds of stage: ResultStage & ShuffleMapStage. ShuffleMapStage will shuffle its data for downstream consumption, but ResultStage don't need to do that.
I guess you may be confused these concepts with Map/Reduce. Actually ShuffleMapStage could be represented as either Map or Reduce as long as it produce intermediate data for downstream consumption. On Fri, Nov 6, 2015 at 4:15 PM, Jacek Laskowski <[email protected]> wrote: > Hi, > > Just to make sure that what I see in the code and think I understand > is indeed correct... > > When a job is submitted to DAGScheduler, it creates a new ResultStage > that in turn queries for the parent stages of itself given the RDD > (using `getParentStagesAndId` in `newResultStage`). > > Are a ResultStage's parent stages only ShuffleMapStages? > > Pozdrawiam, > Jacek > > -- > Jacek Laskowski | http://blog.japila.pl | http://blog.jaceklaskowski.pl > Follow me at https://twitter.com/jaceklaskowski > Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > -- Best Regards Jeff Zhang
