liuzqt opened a new pull request, #50273: URL: https://github.com/apache/spark/pull/50273
### What changes were proposed in this pull request? Log empty partition number metrics in AQE coalesce ### Why are the changes needed? There're cases where shuffle is highly skewed and many partitions are empty (probably due to small NDV) and only a few partitons contains data and are very large. AQE coalesce here basically did nothing by eliminate the empty partitions. AQE coalesce metrics might look confusing and user might think it wrongly coalesce to large partitions. We'd better log empty partition number in the metrics. ### Does this PR introduce _any_ user-facing change? Added `numEmptyPartitions` metrics to AQEShuffleReadExec when there is coalescing. ### How was this patch tested? New test case. ### Was this patch authored or co-authored using generative AI tooling? NO -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org