[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556608#comment-13556608
 ] 

Yin Huai commented on HIVE-2340:
--------------------------------

Let me explain the reason that I introduced the fake RS operator instead of 
just removing the original RS. When I was developing the patch for 2206, I 
found that the aggregation operator (GBY) and the join operator (JOIN) use 
different logic on processing rows forwarded to it. Although they both buffer 
rows, a GBY determines if it need to forward results to its children in 
processOp. While, a JOIN replies on endGroup to know when it should forward 
results. When we have plans like GBY-GBY or JOIN-GBY, that difference on 
processing logic is fine. However, when we have plan like
{code}
GBY----                    GBY----
       \                          \
        ----JOIN    or             ----JOIN
       /                          /
GBY----                    JOIN---
{code}
We need operators between the child JOIN and parent GBYs and JOINs to make sure 
JOIN process rows in a correct way. This is also the reason that in 
CorrelationLocalSimulativeReduceSinkOperator, it determines when to start the 
group of its children in processOp and leave a empty startGroup and endGroup.

Also, by replacing RSs with those fake RSs, I do not need to touch those GBYs 
and JOINs which will be merged into the same Reduce phase. Since the input of 
the first operator in the Reduce side is in the format of [key, value, tag], so 
I use those fake RSs to generate rows in the same format.

But this part of work was implemented about almost 2 years ago. Definitely let 
me know if anything has been changed and this fake RS is no longer needed.
                
> optimize orderby followed by a groupby
> --------------------------------------
>
>                 Key: HIVE-2340
>                 URL: https://issues.apache.org/jira/browse/HIVE-2340
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Query Processor
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Minor
>              Labels: perfomance
>         Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to