[ https://issues.apache.org/jira/browse/HIVE-12788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15091171#comment-15091171 ]
Pengcheng Xiong commented on HIVE-12788: ---------------------------------------- [~ctang.ma], I read your reply to my comments and I quoted here {code} Could you please be more specific on the reason why "if TS[0] branch for src1 is not optimized, then there is no need to continue processing TS[6] branch? Thanks. The AST tree of query (select max(value) from src1 union all select max(value) from src2) passed to StatsOptimizator after UnionRemove optimization is: TS[0]->SEL[1]->GBY[2]-RS[3]->GBY[4]->FS[17] --- for subquery src1 TS[6]->SEL[7]->GBY[8]-RS[9]->GBY[10]->FS[18] --- for subquery src2 It has two top Operators, TS[0] for table src1 and TS[6] for table src2. If the TS[0] branch (for subquery src1) is not optimized but TS[6] branch (for subquery src2) is, in existing code, TS[6] branch result will be set to FetchTask in ParseContext and the entire query is not further compiled into MRTasks (in SemanticAnalyzer.analyzeInternal step 9). So the union query will return result with only the row from TS[6] (the subquery src2). It is obviously not right. So for union query, if any one of its subqueries could not be Stats Optimizated, the whole query should not be optimized and fails back to regular plan. I wonder if it is a littler clear. Thanks - Chaoyu {code} Did you investigate the possibility to merge the result from a FetchTask and a MR task? I would still prefer a solution that can best leverage the stats optimizer whenever possible (even only for a branch of a union). [~ashutoshc], could you provide some comments on the possibility here? Thanks. > Setting hive.optimize.union.remove to TRUE will break UNION ALL with > aggregate functions > ---------------------------------------------------------------------------------------- > > Key: HIVE-12788 > URL: https://issues.apache.org/jira/browse/HIVE-12788 > Project: Hive > Issue Type: Bug > Components: Hive > Affects Versions: 1.1.1 > Reporter: Eric Lin > Assignee: Chaoyu Tang > Attachments: HIVE-12788.1.patch, HIVE-12788.patch > > > See the test case below: > {code} > 0: jdbc:hive2://localhost:10000/default> create table test (a int); > 0: jdbc:hive2://localhost:10000/default> insert overwrite table test values > (1); > 0: jdbc:hive2://localhost:10000/default> set hive.optimize.union.remove=true; > No rows affected (0.01 seconds) > 0: jdbc:hive2://localhost:10000/default> set > hive.mapred.supports.subdirectories=true; > No rows affected (0.007 seconds) > 0: jdbc:hive2://localhost:10000/default> SELECT COUNT(1) FROM test UNION ALL > SELECT COUNT(1) FROM test; > +----------+--+ > | _u1._c0 | > +----------+--+ > +----------+--+ > {code} > UNION ALL without COUNT function will work as expected: > {code} > 0: jdbc:hive2://localhost:10000/default> select * from test UNION ALL SELECT > * FROM test; > +--------+--+ > | _u1.a | > +--------+--+ > | 1 | > | 1 | > +--------+--+ > {code} > Run the same query without setting hive.mapred.supports.subdirectories and > hive.optimize.union.remove to true will give correct result: > {code} > 0: jdbc:hive2://localhost:10000/default> set hive.optimize.union.remove; > +-----------------------------------+--+ > | set | > +-----------------------------------+--+ > | hive.optimize.union.remove=false | > +-----------------------------------+--+ > 0: jdbc:hive2://localhost:10000/default> SELECT COUNT(1) FROM test UNION ALL > SELECT COUNT(1) FROM test; > +----------+--+ > | _u1._c0 | > +----------+--+ > | 1 | > | 1 | > +----------+--+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)