[ 
https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13751330#comment-13751330
 ] 

Phabricator commented on HIVE-4002:
-----------------------------------

yhuai has commented on the revision "HIVE-4002 [jira] Fetch task aggregation 
for simple group by query".

INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java:493 I think 
that flush is only needed for blocking operators. With this optimization, the 
operator tree in the fetch task seems only have a single blocking operator 
which is GBY. Since GBY is the first operator in the fetch task (the operator 
shown in flush() in this class), I do not think we need to call all operators 
in the operator tree. Is that possible GBY is not the first operator?
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:6985 there 
are other places where we are using colInfo.getInternalName(). I think it is 
better to also change those places if we want to use field.
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java:582 Let's say we 
have a chain of operators OP1-OP2-OP3. With this change, when flush in OP1 is 
called, it will call its flushOp and then call flushOp in OP2. Seems flush or 
flushOp in OP3 will never be called. Also, when I introduced flush with 
Correlation Optimizer, this method was not designed to propagate the signal to 
its children.

REVISION DETAIL
  https://reviews.facebook.net/D8739

To: JIRA, navis
Cc: yhuai

                
> Fetch task aggregation for simple group by query
> ------------------------------------------------
>
>                 Key: HIVE-4002
>                 URL: https://issues.apache.org/jira/browse/HIVE-4002
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Minor
>         Attachments: HIVE-4002.D8739.1.patch, HIVE-4002.D8739.2.patch, 
> HIVE-4002.D8739.3.patch, HIVE-4002.D8739.4.patch
>
>
> Aggregation queries with no group-by clause (for example, select count(*) 
> from src) executes final aggregation in single reduce task. But it's too 
> small even for single reducer because the most of UDAF generates just single 
> row for map aggregation. If final fetch task can aggregate outputs from map 
> tasks, shuffling time can be removed.
> This optimization transforms operator tree something like,
> TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK
> into 
> TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS)
> With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 
> min, before).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to