[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556608#comment-13556608 ]
Yin Huai commented on HIVE-2340: -------------------------------- Let me explain the reason that I introduced the fake RS operator instead of just removing the original RS. When I was developing the patch for 2206, I found that the aggregation operator (GBY) and the join operator (JOIN) use different logic on processing rows forwarded to it. Although they both buffer rows, a GBY determines if it need to forward results to its children in processOp. While, a JOIN replies on endGroup to know when it should forward results. When we have plans like GBY-GBY or JOIN-GBY, that difference on processing logic is fine. However, when we have plan like {code} GBY---- GBY---- \ \ ----JOIN or ----JOIN / / GBY---- JOIN--- {code} We need operators between the child JOIN and parent GBYs and JOINs to make sure JOIN process rows in a correct way. This is also the reason that in CorrelationLocalSimulativeReduceSinkOperator, it determines when to start the group of its children in processOp and leave a empty startGroup and endGroup. Also, by replacing RSs with those fake RSs, I do not need to touch those GBYs and JOINs which will be merged into the same Reduce phase. Since the input of the first operator in the Reduce side is in the format of [key, value, tag], so I use those fake RSs to generate rows in the same format. But this part of work was implemented about almost 2 years ago. Definitely let me know if anything has been changed and this fake RS is no longer needed. > optimize orderby followed by a groupby > -------------------------------------- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor > Reporter: Navis > Assignee: Navis > Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira