Yin Huai created SPARK-11840: -------------------------------- Summary: Restore the 1.5's behavior of planning a single distinct aggregation. Key: SPARK-11840 URL: https://issues.apache.org/jira/browse/SPARK-11840 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Yin Huai Assignee: Yin Huai
The impact of this change is for a query that has a single distinct column and does not have any grouping expression like {{SELECT COUNT(DISTINCT a) FROM table}} The plan will be changed from {code} AGG-2 (count distinct) Shuffle to a single reducer Partial-AGG-2 (count distinct) AGG-1 (grouping on a) Shuffle by a Partial-AGG-1 (grouping on 1) {code} to the following one (1.5 uses this) {code} AGG-2 AGG-1 (grouping on a) Shuffle to a single reducer Partial-AGG-1(grouping on a) {code} The first plan is more robust. However, to better benchmark the impact of this change, we should use 1.5's plan and use the conf of {{spark.sql.specializeSingleDistinctAggPlanning}} to control the plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org