attilapiros commented on PR #50757: URL: https://github.com/apache/spark/pull/50757#issuecomment-2848147158
> I thought there would be a result stage after every map stage implying that there can be no two consequent Map stages. No, a job always have one result stage and can have zero or more shuffle map stages. Whenever you have question like this you can run a simple spark job and check the logging or in this case even the UI. ``` $ ./bin/spark-shell scala> sc.setLogLevel("DEBUG") scala> sc.parallelize(1 to 100, 10).groupBy(i => i).map(_._1).groupBy(i=>i).map(_._1).collect ... 25/05/02 14:13:32 DEBUG DAGScheduler: submitStage(ResultStage 2 (name=collect at <console>:24;jobs=0)) 25/05/02 14:13:32 DEBUG DAGScheduler: missing: List(ShuffleMapStage 1) 25/05/02 14:13:32 DEBUG DAGScheduler: submitStage(ShuffleMapStage 1 (name=groupBy at <console>:24;jobs=0)) 25/05/02 14:13:32 DEBUG DAGScheduler: missing: List(ShuffleMapStage 0) 25/05/02 14:13:32 DEBUG DAGScheduler: submitStage(ShuffleMapStage 0 (name=groupBy at <console>:24;jobs=0)) 25/05/02 14:13:32 DEBUG DAGScheduler: missing: List() ... ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org