[jira] [Commented] (HIVE-12788) Setting hive.optimize.union.remove to TRUE will break UNION ALL with aggregate functions

Chaoyu Tang (JIRA) Wed, 06 Jan 2016 19:59:22 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-12788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086764#comment-15086764
 ]


Chaoyu Tang commented on HIVE-12788:
------------------------------------

1. When hive.compute.query.using.stats is enabled, the union all with aggregate 
function with union.remove optimization only returns one row, which I think is 
due to an issue in StatsOptimizator and I am working on now.
{code}
set hive.compute.query.using.stats=true;
set hive.optimize.union.remove=true;
hive> Select count(*) as scount from default.sample02 union all Select count(*) 
as scount from default.sample01;
OK
723
{code}
2. When hive.compute.query.using.stats is disabled, you have to set 
mapred.input.dir.recursive=true in order to make hive.optimize.union.remove 
work. 
{code}
set hive.compute.query.using.stats=false;
set hive.optimize.union.remove=true;
set mapred.input.dir.recursive=true;
hive> Select count(*) as scount from default.sample02 union all Select count(*) 
as scount from default.sample01;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
future versions. Consider using a different execution engine (i.e. tez, spark) 
or using Hive 1.X releases.
Query ID = ctang_20160106151655_c0eb9943-2963-4162-b9f4-c964005bf1a3
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2016-01-06 22:47:52,677 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_local51783692_0010
Launching Job 2 out of 2
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2016-01-06 22:47:55,278 Stage-2 map = 100%,  reduce = 100%
Ended Job = job_local1194656206_0011
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Stage-Stage-2:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
823
723
Time taken: 8.339 seconds, Fetched: 2 row(s)
{code}
3. With union remove optimization disabled, union all with aggregation function 
always works regardless StatsOptimization is enabled or not since the 
StatsOptimization is not applicable.


> Setting hive.optimize.union.remove to TRUE will break UNION ALL with 
> aggregate functions
> ----------------------------------------------------------------------------------------
>
>                 Key: HIVE-12788
>                 URL: https://issues.apache.org/jira/browse/HIVE-12788
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 1.1.1
>            Reporter: Eric Lin
>            Assignee: Chaoyu Tang
>
> See the test case below:
> {code}
> 0: jdbc:hive2://localhost:10000/default> create table test (a int);
> 0: jdbc:hive2://localhost:10000/default> insert overwrite table test values 
> (1);
> 0: jdbc:hive2://localhost:10000/default> set hive.optimize.union.remove=true;
> No rows affected (0.01 seconds)
> 0: jdbc:hive2://localhost:10000/default> set 
> hive.mapred.supports.subdirectories=true;
> No rows affected (0.007 seconds)
> 0: jdbc:hive2://localhost:10000/default> SELECT COUNT(1) FROM test UNION ALL 
> SELECT COUNT(1) FROM test;
> +----------+--+
> | _u1._c0  |
> +----------+--+
> +----------+--+
> {code}
> UNION ALL without COUNT function will work as expected:
> {code}
> 0: jdbc:hive2://localhost:10000/default> select * from test UNION ALL SELECT 
> * FROM test;
> +--------+--+
> | _u1.a  |
> +--------+--+
> | 1      |
> | 1      |
> +--------+--+
> {code}
> Run the same query without setting hive.mapred.supports.subdirectories and 
> hive.optimize.union.remove to true will give correct result:
> {code}
> 0: jdbc:hive2://localhost:10000/default> set hive.optimize.union.remove;
> +-----------------------------------+--+
> |                set                |
> +-----------------------------------+--+
> | hive.optimize.union.remove=false  |
> +-----------------------------------+--+
> 0: jdbc:hive2://localhost:10000/default> SELECT COUNT(1) FROM test UNION ALL 
> SELECT COUNT(1) FROM test;
> +----------+--+
> | _u1._c0  |
> +----------+--+
> | 1        |
> | 1        |
> +----------+--+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12788) Setting hive.optimize.union.remove to TRUE will break UNION ALL with aggregate functions

Reply via email to