[ https://issues.apache.org/jira/browse/HIVE-12788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086764#comment-15086764 ]
Chaoyu Tang commented on HIVE-12788: ------------------------------------ 1. When hive.compute.query.using.stats is enabled, the union all with aggregate function with union.remove optimization only returns one row, which I think is due to an issue in StatsOptimizator and I am working on now. {code} set hive.compute.query.using.stats=true; set hive.optimize.union.remove=true; hive> Select count(*) as scount from default.sample02 union all Select count(*) as scount from default.sample01; OK 723 {code} 2. When hive.compute.query.using.stats is disabled, you have to set mapred.input.dir.recursive=true in order to make hive.optimize.union.remove work. {code} set hive.compute.query.using.stats=false; set hive.optimize.union.remove=true; set mapred.input.dir.recursive=true; hive> Select count(*) as scount from default.sample02 union all Select count(*) as scount from default.sample01; WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases. Query ID = ctang_20160106151655_c0eb9943-2963-4162-b9f4-c964005bf1a3 Total jobs = 2 Launching Job 1 out of 2 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Job running in-process (local Hadoop) 2016-01-06 22:47:52,677 Stage-1 map = 100%, reduce = 100% Ended Job = job_local51783692_0010 Launching Job 2 out of 2 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Job running in-process (local Hadoop) 2016-01-06 22:47:55,278 Stage-2 map = 100%, reduce = 100% Ended Job = job_local1194656206_0011 MapReduce Jobs Launched: Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS Stage-Stage-2: HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK 823 723 Time taken: 8.339 seconds, Fetched: 2 row(s) {code} 3. With union remove optimization disabled, union all with aggregation function always works regardless StatsOptimization is enabled or not since the StatsOptimization is not applicable. > Setting hive.optimize.union.remove to TRUE will break UNION ALL with > aggregate functions > ---------------------------------------------------------------------------------------- > > Key: HIVE-12788 > URL: https://issues.apache.org/jira/browse/HIVE-12788 > Project: Hive > Issue Type: Bug > Components: Hive > Affects Versions: 1.1.1 > Reporter: Eric Lin > Assignee: Chaoyu Tang > > See the test case below: > {code} > 0: jdbc:hive2://localhost:10000/default> create table test (a int); > 0: jdbc:hive2://localhost:10000/default> insert overwrite table test values > (1); > 0: jdbc:hive2://localhost:10000/default> set hive.optimize.union.remove=true; > No rows affected (0.01 seconds) > 0: jdbc:hive2://localhost:10000/default> set > hive.mapred.supports.subdirectories=true; > No rows affected (0.007 seconds) > 0: jdbc:hive2://localhost:10000/default> SELECT COUNT(1) FROM test UNION ALL > SELECT COUNT(1) FROM test; > +----------+--+ > | _u1._c0 | > +----------+--+ > +----------+--+ > {code} > UNION ALL without COUNT function will work as expected: > {code} > 0: jdbc:hive2://localhost:10000/default> select * from test UNION ALL SELECT > * FROM test; > +--------+--+ > | _u1.a | > +--------+--+ > | 1 | > | 1 | > +--------+--+ > {code} > Run the same query without setting hive.mapred.supports.subdirectories and > hive.optimize.union.remove to true will give correct result: > {code} > 0: jdbc:hive2://localhost:10000/default> set hive.optimize.union.remove; > +-----------------------------------+--+ > | set | > +-----------------------------------+--+ > | hive.optimize.union.remove=false | > +-----------------------------------+--+ > 0: jdbc:hive2://localhost:10000/default> SELECT COUNT(1) FROM test UNION ALL > SELECT COUNT(1) FROM test; > +----------+--+ > | _u1._c0 | > +----------+--+ > | 1 | > | 1 | > +----------+--+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)