[ 
https://issues.apache.org/jira/browse/SPARK-51505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ziqi Liu updated SPARK-51505:
-----------------------------
    Description: 
There're cases where shuffle is highly skewed and many partitions are 
empty(probably due to small NDV), AQE coalesce metrics might look confusing and 
user might think it wrongly coalesce to large partitions, while the actual 
situation is that a few partitions are super large while others are empty. 

We'd better log empty partition number in the metrics.

  was:
There're cases where shuffle is highly skewed and many partitions (probably due 
to small NDV), AQE coalesce metrics might look confusing and user might think 
it wrongly coalesce to large partitions, while the actual situation is that a 
few partitions are super large while others are empty. 

We'd better log empty partition number in the metrics.


> Log empty partition number metrics in AQE coalesce
> --------------------------------------------------
>
>                 Key: SPARK-51505
>                 URL: https://issues.apache.org/jira/browse/SPARK-51505
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 4.0.0
>            Reporter: Ziqi Liu
>            Priority: Major
>
> There're cases where shuffle is highly skewed and many partitions are 
> empty(probably due to small NDV), AQE coalesce metrics might look confusing 
> and user might think it wrongly coalesce to large partitions, while the 
> actual situation is that a few partitions are super large while others are 
> empty. 
> We'd better log empty partition number in the metrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to