[ https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13750398#comment-13750398 ]
Phabricator commented on HIVE-3562: ----------------------------------- ashutoshc has commented on the revision "HIVE-3562 [jira] Some limit can be pushed down to map stage". Looks pretty good. Just requesting to add few more comments. INLINE COMMENTS conf/hive-default.xml.template:1586-1590 We can remove this now. ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java:1186 Just to add more clarity, say something like: we can push the limit above GBY (running in Reducer), since that will generate single row for each group. This doesn't necessarily hold for GBY (running in Mappers), so we don't push limit above it. ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java:182 It will be good to add comment about what this field is holding. Add a comment saying: This two dimensional array holds key data and a corresponding Union object which contains the tag identifying the aggregate expression for distinct columns. Ideally, instead of this 2-D array, we should have probably enhanced HiveKey class for this logic. We should do that in a follow-up jira. ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java:267 I didnt follow this logic completely. Seems like this is an optimization not to evaluate union object repeatedly and do system copy for it. Can you add a comment explaining this? ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java:271 seems like it will be null only for all i = 0. If so, better do if (i==0) check ? Also add comment when this will be null and when it will be non-null? ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java:260 You made changes to this section, because you found bug or are you purely refactoring this? If you hit upon the bug, can you explain what was it? ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java:50-51 It will be good to add comment about what these 2D arrays are holding? ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java:52 Also, add comment saying this array holds hashcode for keys. Also, add note that indices of all these arrays must line up. ql/src/java/org/apache/hadoop/hive/ql/optimizer/LimitPushdownOptimizer.java:82 Nice Comments! ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java:34 It will be good to add a javadoc for this class. ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java:36 Also javadoc for this interface. REVISION DETAIL https://reviews.facebook.net/D5967 To: JIRA, tarball, ashutoshc, navis Cc: njain > Some limit can be pushed down to map stage > ------------------------------------------ > > Key: HIVE-3562 > URL: https://issues.apache.org/jira/browse/HIVE-3562 > Project: Hive > Issue Type: Bug > Reporter: Navis > Assignee: Navis > Priority: Trivial > Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, > HIVE-3562.D5967.3.patch, HIVE-3562.D5967.4.patch, HIVE-3562.D5967.5.patch, > HIVE-3562.D5967.6.patch, HIVE-3562.D5967.7.patch, HIVE-3562.D5967.8.patch > > > Queries with limit clause (with reasonable number), for example > {noformat} > select * from src order by key limit 10; > {noformat} > makes operator tree, > TS-SEL-RS-EXT-LIMIT-FS > But LIMIT can be partially calculated in RS, reducing size of shuffling. > TS-SEL-RS(TOP-N)-EXT-LIMIT-FS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira