[jira] [Commented] (HIVE-12788) Setting hive.optimize.union.remove to TRUE will break UNION ALL with aggregate functions

Pengcheng Xiong (JIRA) Sun, 10 Jan 2016 11:30:06 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-12788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15091171#comment-15091171
 ]


Pengcheng Xiong commented on HIVE-12788:
----------------------------------------

[~ctang.ma], I read your reply to my comments and I quoted here
{code}
Could you please be more specific on the reason why "if TS[0] branch for src1 
is not optimized, then there is no need to continue processing TS[6] branch? 
Thanks.
The AST tree of query (select max(value) from src1 union all select max(value) 
from src2) passed to StatsOptimizator after UnionRemove optimization is:
      TS[0]->SEL[1]->GBY[2]-RS[3]->GBY[4]->FS[17]   --- for subquery src1
      TS[6]->SEL[7]->GBY[8]-RS[9]->GBY[10]->FS[18]  --- for subquery src2
It has two top Operators, TS[0] for table src1 and TS[6] for table src2. If the 
TS[0] branch (for subquery src1) is not optimized but TS[6] branch (for 
subquery src2) is, in existing code, TS[6] branch result will be set to 
FetchTask in ParseContext and the entire query is not further compiled into 
MRTasks (in SemanticAnalyzer.analyzeInternal step 9). So the union query will 
return result with only the row from TS[6] (the subquery src2). It is obviously 
not right. So for union query, if any one of its subqueries could not be Stats 
Optimizated, the whole query should not be optimized and fails back to regular 
plan. I wonder if it is a littler clear. Thanks

- Chaoyu
{code}

Did you investigate the possibility to merge the result from a FetchTask and a 
MR task? I would still prefer a solution that can best leverage the stats 
optimizer whenever possible (even only for a branch of a union). [~ashutoshc], 
could you provide some comments on the possibility here? Thanks.

> Setting hive.optimize.union.remove to TRUE will break UNION ALL with 
> aggregate functions
> ----------------------------------------------------------------------------------------
>
>                 Key: HIVE-12788
>                 URL: https://issues.apache.org/jira/browse/HIVE-12788
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 1.1.1
>            Reporter: Eric Lin
>            Assignee: Chaoyu Tang
>         Attachments: HIVE-12788.1.patch, HIVE-12788.patch
>
>
> See the test case below:
> {code}
> 0: jdbc:hive2://localhost:10000/default> create table test (a int);
> 0: jdbc:hive2://localhost:10000/default> insert overwrite table test values 
> (1);
> 0: jdbc:hive2://localhost:10000/default> set hive.optimize.union.remove=true;
> No rows affected (0.01 seconds)
> 0: jdbc:hive2://localhost:10000/default> set 
> hive.mapred.supports.subdirectories=true;
> No rows affected (0.007 seconds)
> 0: jdbc:hive2://localhost:10000/default> SELECT COUNT(1) FROM test UNION ALL 
> SELECT COUNT(1) FROM test;
> +----------+--+
> | _u1._c0  |
> +----------+--+
> +----------+--+
> {code}
> UNION ALL without COUNT function will work as expected:
> {code}
> 0: jdbc:hive2://localhost:10000/default> select * from test UNION ALL SELECT 
> * FROM test;
> +--------+--+
> | _u1.a  |
> +--------+--+
> | 1      |
> | 1      |
> +--------+--+
> {code}
> Run the same query without setting hive.mapred.supports.subdirectories and 
> hive.optimize.union.remove to true will give correct result:
> {code}
> 0: jdbc:hive2://localhost:10000/default> set hive.optimize.union.remove;
> +-----------------------------------+--+
> |                set                |
> +-----------------------------------+--+
> | hive.optimize.union.remove=false  |
> +-----------------------------------+--+
> 0: jdbc:hive2://localhost:10000/default> SELECT COUNT(1) FROM test UNION ALL 
> SELECT COUNT(1) FROM test;
> +----------+--+
> | _u1._c0  |
> +----------+--+
> | 1        |
> | 1        |
> +----------+--+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12788) Setting hive.optimize.union.remove to TRUE will break UNION ALL with aggregate functions

Reply via email to