Yin Huai created SPARK-11840:
--------------------------------

             Summary: Restore the 1.5's behavior of planning a single distinct 
aggregation.
                 Key: SPARK-11840
                 URL: https://issues.apache.org/jira/browse/SPARK-11840
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
            Reporter: Yin Huai
            Assignee: Yin Huai


The impact of this change is for a query that has a single distinct column and 
does not have any grouping expression like
{{SELECT COUNT(DISTINCT a) FROM table}}
The plan will be changed from

{code}
AGG-2 (count distinct)
  Shuffle to a single reducer
    Partial-AGG-2 (count distinct)
      AGG-1 (grouping on a)
        Shuffle by a
          Partial-AGG-1 (grouping on 1)
{code}
to the following one (1.5 uses this)

{code}
AGG-2
  AGG-1 (grouping on a)
    Shuffle to a single reducer
      Partial-AGG-1(grouping on a)
{code}
The first plan is more robust. However, to better benchmark the impact of this 
change, we should use 1.5's plan and use the conf of 
{{spark.sql.specializeSingleDistinctAggPlanning}} to control the plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to