[jira] [Commented] (HIVE-3562) Some limit can be pushed down to map stage

Phabricator (JIRA) Mon, 26 Aug 2013 11:52:42 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13750398#comment-13750398
 ]


Phabricator commented on HIVE-3562:
-----------------------------------

ashutoshc has commented on the revision "HIVE-3562 [jira] Some limit can be 
pushed down to map stage".

  Looks pretty good. Just requesting to add few more comments.

INLINE COMMENTS
  conf/hive-default.xml.template:1586-1590 We can remove this now.
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java:1186 Just to 
add more clarity, say something like: we can push the limit above GBY (running 
in Reducer), since that will generate single row for each group. This doesn't 
necessarily hold for GBY (running in Mappers), so we don't push limit above it.
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java:182 It 
will be good to add comment about what this field is holding. Add a comment 
saying: This two dimensional array holds key data and a corresponding Union 
object which contains the tag identifying the aggregate expression for distinct 
columns.

  Ideally, instead of this 2-D array, we should have probably enhanced HiveKey 
class for this logic. We should do that in a follow-up jira.
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java:267 I 
didnt follow this logic completely.
  Seems like this is an optimization not to evaluate union object repeatedly 
and do system copy for it. Can you add a comment explaining this?
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java:271 seems 
like it will be null only for all i = 0. If so, better do if (i==0) check ? 
Also add comment when this will be null and when it will be non-null?
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java:260 You 
made changes to this section, because you found bug or are you purely 
refactoring this? If you hit upon the bug, can you explain what was it?
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java:50-51 It will be 
good to add comment about what these 2D arrays are holding?
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java:52 Also, add comment 
saying this array holds hashcode for keys.

  Also, add note that indices of all these arrays must line up.
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/LimitPushdownOptimizer.java:82 
Nice Comments!
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java:34 It will be good 
to add a javadoc for this class.
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java:36 Also javadoc for 
this interface.

REVISION DETAIL
  https://reviews.facebook.net/D5967

To: JIRA, tarball, ashutoshc, navis
Cc: njain

                
> Some limit can be pushed down to map stage
> ------------------------------------------
>
>                 Key: HIVE-3562
>                 URL: https://issues.apache.org/jira/browse/HIVE-3562
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Trivial
>         Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, 
> HIVE-3562.D5967.3.patch, HIVE-3562.D5967.4.patch, HIVE-3562.D5967.5.patch, 
> HIVE-3562.D5967.6.patch, HIVE-3562.D5967.7.patch, HIVE-3562.D5967.8.patch
>
>
> Queries with limit clause (with reasonable number), for example
> {noformat}
> select * from src order by key limit 10;
> {noformat}
> makes operator tree, 
> TS-SEL-RS-EXT-LIMIT-FS
> But LIMIT can be partially calculated in RS, reducing size of shuffling.
> TS-SEL-RS(TOP-N)-EXT-LIMIT-FS

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3562) Some limit can be pushed down to map stage

Reply via email to