[ https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16542352#comment-16542352 ]
Hive QA commented on HIVE-17896: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12931291/HIVE-17896.11.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:green}SUCCESS:{color} +1 due to 14656 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/12567/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/12567/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-12567/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12931291 - PreCommit-HIVE-Build > TopNKey: Create a standalone vectorizable TopNKey operator > ---------------------------------------------------------- > > Key: HIVE-17896 > URL: https://issues.apache.org/jira/browse/HIVE-17896 > Project: Hive > Issue Type: New Feature > Components: Operators > Affects Versions: 3.0.0 > Reporter: Gopal V > Assignee: Teddy Choi > Priority: Major > Attachments: HIVE-17896.1.patch, HIVE-17896.10.patch, > HIVE-17896.11.patch, HIVE-17896.3.patch, HIVE-17896.4.patch, > HIVE-17896.5.patch, HIVE-17896.6.patch, HIVE-17896.7.patch, > HIVE-17896.8.patch, HIVE-17896.9.patch > > > For TPC-DS Query27, the TopN operation is delayed by the group-by - the > group-by operator buffers up all the rows before discarding the 99% of the > rows in the TopN Hash within the ReduceSink Operator. > The RS TopN operator is very restrictive as it only supports doing the > filtering on the shuffle keys, but it is better to do this before breaking > the vectors into rows and losing the isRepeating properties. > Adding a TopN Key operator in the physical operator tree allows the following > to happen. > GBY->RS(Top=1) > can become > TNK(1)->GBY->RS(Top=1) > So that, the TopNKey can remove rows before they are buffered into the GBY > and consume memory. > Here's the equivalent implementation in Presto > https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/TopNOperator.java#L35 > Adding this as a sub-feature of GroupBy prevents further optimizations if the > GBY is on keys "a,b,c" and the TopNKey is on just "a". -- This message was sent by Atlassian JIRA (v7.6.3#76005)