[ https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13751330#comment-13751330 ]
Phabricator commented on HIVE-4002: ----------------------------------- yhuai has commented on the revision "HIVE-4002 [jira] Fetch task aggregation for simple group by query". INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java:493 I think that flush is only needed for blocking operators. With this optimization, the operator tree in the fetch task seems only have a single blocking operator which is GBY. Since GBY is the first operator in the fetch task (the operator shown in flush() in this class), I do not think we need to call all operators in the operator tree. Is that possible GBY is not the first operator? ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:6985 there are other places where we are using colInfo.getInternalName(). I think it is better to also change those places if we want to use field. ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java:582 Let's say we have a chain of operators OP1-OP2-OP3. With this change, when flush in OP1 is called, it will call its flushOp and then call flushOp in OP2. Seems flush or flushOp in OP3 will never be called. Also, when I introduced flush with Correlation Optimizer, this method was not designed to propagate the signal to its children. REVISION DETAIL https://reviews.facebook.net/D8739 To: JIRA, navis Cc: yhuai > Fetch task aggregation for simple group by query > ------------------------------------------------ > > Key: HIVE-4002 > URL: https://issues.apache.org/jira/browse/HIVE-4002 > Project: Hive > Issue Type: Improvement > Components: Query Processor > Reporter: Navis > Assignee: Navis > Priority: Minor > Attachments: HIVE-4002.D8739.1.patch, HIVE-4002.D8739.2.patch, > HIVE-4002.D8739.3.patch, HIVE-4002.D8739.4.patch > > > Aggregation queries with no group-by clause (for example, select count(*) > from src) executes final aggregation in single reduce task. But it's too > small even for single reducer because the most of UDAF generates just single > row for map aggregation. If final fetch task can aggregate outputs from map > tasks, shuffling time can be removed. > This optimization transforms operator tree something like, > TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK > into > TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS) > With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 > min, before). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira