[ https://issues.apache.org/jira/browse/SPARK-11840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-11840: ------------------------------------ Assignee: Apache Spark (was: Yin Huai) > Restore the 1.5's behavior of planning a single distinct aggregation. > --------------------------------------------------------------------- > > Key: SPARK-11840 > URL: https://issues.apache.org/jira/browse/SPARK-11840 > Project: Spark > Issue Type: Sub-task > Components: SQL > Reporter: Yin Huai > Assignee: Apache Spark > > The impact of this change is for a query that has a single distinct column > and does not have any grouping expression like > {{SELECT COUNT(DISTINCT a) FROM table}} > The plan will be changed from > {code} > AGG-2 (count distinct) > Shuffle to a single reducer > Partial-AGG-2 (count distinct) > AGG-1 (grouping on a) > Shuffle by a > Partial-AGG-1 (grouping on 1) > {code} > to the following one (1.5 uses this) > {code} > AGG-2 > AGG-1 (grouping on a) > Shuffle to a single reducer > Partial-AGG-1(grouping on a) > {code} > The first plan is more robust. However, to better benchmark the impact of > this change, we should use 1.5's plan and use the conf of > {{spark.sql.specializeSingleDistinctAggPlanning}} to control the plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org