Re: [PR] [SPARK-51016][SQL] Non-deterministic SQL expressions should set indeterminate map stage output level [spark]

via GitHub Fri, 02 May 2025 14:25:28 -0700


attilapiros commented on PR #50757:
URL: https://github.com/apache/spark/pull/50757#issuecomment-2848147158


   >  I thought there would be a result stage after every map stage implying 
that there can be no two consequent Map stages. 
   
   No, a job always have one result stage and can have zero or more shuffle map 
stages.
   
   Whenever you have question like this you can run a simple spark job and 
check the logging or in this case even the UI. 
   ```
   $  ./bin/spark-shell
   scala> sc.setLogLevel("DEBUG")
   scala> sc.parallelize(1 to 100, 10).groupBy(i => 
i).map(_._1).groupBy(i=>i).map(_._1).collect
   ...
   25/05/02 14:13:32 DEBUG DAGScheduler: submitStage(ResultStage 2 
(name=collect at <console>:24;jobs=0))
   25/05/02 14:13:32 DEBUG DAGScheduler: missing: List(ShuffleMapStage 1)
   25/05/02 14:13:32 DEBUG DAGScheduler: submitStage(ShuffleMapStage 1 
(name=groupBy at <console>:24;jobs=0))
   25/05/02 14:13:32 DEBUG DAGScheduler: missing: List(ShuffleMapStage 0)
   25/05/02 14:13:32 DEBUG DAGScheduler: submitStage(ShuffleMapStage 0 
(name=groupBy at <console>:24;jobs=0))
   25/05/02 14:13:32 DEBUG DAGScheduler: missing: List()
   ...
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51016][SQL] Non-deterministic SQL expressions should set indeterminate map stage output level [spark]

Reply via email to