liuzqt opened a new pull request, #50273:
URL: https://github.com/apache/spark/pull/50273

   ### What changes were proposed in this pull request?
   
   Log empty partition number metrics in AQE coalesce
   
   
   ### Why are the changes needed?
   
   There're cases where shuffle is highly skewed and many partitions are empty 
(probably due to small NDV) and only a few partitons contains data and are very 
large. AQE coalesce here basically did nothing by eliminate the empty 
partitions. AQE coalesce metrics might look confusing and user might think it 
wrongly coalesce to large partitions.
   
   We'd better log empty partition number in the metrics.
   
   
   ### Does this PR introduce _any_ user-facing change?
   Added `numEmptyPartitions` metrics to AQEShuffleReadExec when there is 
coalescing.
   
   
   ### How was this patch tested?
   
   New test case.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   NO
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to