[ 
https://issues.apache.org/jira/browse/HIVE-17355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Allamraju reassigned HIVE-17355:
---------------------------------------

    Assignee: Aditya Allamraju

> Casting to Decimal along with UNION ALL gives incosistent results
> -----------------------------------------------------------------
>
>                 Key: HIVE-17355
>                 URL: https://issues.apache.org/jira/browse/HIVE-17355
>             Project: Hive
>          Issue Type: Bug
>          Components: Parser, UDF
>    Affects Versions: 2.1.0, 2.1.1
>         Environment: CentOS 7.2
>            Reporter: Aditya Allamraju
>            Assignee: Aditya Allamraju
>
> Extra trailing zeros are added when running "union all" on the tables 
> containing decimal data types.
> Version: Hive 2.1
> Steps to repro:-
> ================
> 1) CREATE TABLE `decisample`(
>   `a` decimal(8,2),
>   `b` int,
>   `c` decimal(5,2))
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'maprfs:/user/hive/warehouse/decisample'
> 2) CREATE TABLE `decisample3`(
>   `a` decimal(8,2),
>   `b` int,
>   `c` decimal(5,2))
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'maprfs:/user/hive/warehouse/decisample3'
> 3)hive> select * from decisample3;
> OK
> 1.00    2       3.00
> 7.00    8       9.00
> 4)hive> select * from decisample;
> OK
> 4.00    5       6.00
> 5) query:- 
> select a1.a, '' as a1b,'' as a1c from decisample a1 union all select 
> a2.a,a2.b,a2.c from decisample3 a2;
> o/p:-
> OK
> 4.00            NULL
> 1.00    2       3.000000000000000000
> 7.00    8       9.000000000000000000
> Time taken: 87.993 seconds, Fetched: 3 row(s)
> 6)select a2.a,a2.b,a2.c from decisample3 a2 union all select a1.a, '' as 
> a1b,'' as a1c from decisample a1;
> o/p:-
> 4.00
> 1.00    2       3
> 7.00    8       9
> Steps 5 is yielding 18 trailing zeros where as step 6 query is yieldings no 
> trailing  zero.
> Observation:
> 1. Hive is trying to run the UNION ALL after ensuring the SELECT's are 
> semantically same(equal number of columns and same datatypes). To do this, it 
> is implicitly type casting the values where required.
> From the explain plan, type casting is not consistent when done 2 different 
> ways:
> a)  select-1  UNION ALL select-2 (Query-5 in above comment)
> vs
> b) select-2 UNION ALL select-2   (Query-6 in above comment)
> Showing only the "expresssions" part of execution plans
> Query-5:
> ========
> ..
> ..
> Map Operator Tree:
>           TableScan
>             alias: a1
>             Statistics: Num rows: 1 Data size: 11 Basic stats: COMPLETE 
> Column stats: NONE
>             Select Operator
>               expressions: a (type: decimal(8,2)), '' (type: string), null 
> (type: decimal(38,18))
>               outputColumnNames: _col0, _col1, _col2
> ..
> ..
> TableScan
>             alias: a2
>             Statistics: Num rows: 2 Data size: 22 Basic stats: COMPLETE 
> Column stats: NONE
>             Select Operator
>               expressions: a (type: decimal(8,2)), UDFToString(b) (type: 
> string), CAST( c AS decimal(38,18)) (type: decimal(38,18))
> Query-6:
> ========
> ..
> ..
> Map Operator Tree:
>           TableScan
>             alias: a2
>             Statistics: Num rows: 2 Data size: 22 Basic stats: COMPLETE 
> Column stats: NONE
>             Select Operator
>               expressions: a (type: decimal(8,2)), UDFToString(b) (type: 
> string), UDFToString(c) (type: string)
> ..
> ..
> TableScan
>             alias: a1
>             Statistics: Num rows: 1 Data size: 11 Basic stats: COMPLETE 
> Column stats: NONE
>             Select Operator
>               expressions: a (type: decimal(8,2)), '' (type: string), '' 
> (type: string)
> ..
> ..
> Attaching the execution plans for both queries for reference.
> 2. The reason for 18 zeros in query-5 above is due to casting NULL to Decimal.
> And by default, the precision and scale are taken as (38,18) in Hive. This 
> could be the reason for 18 zeros.
> 3. This is repeating every time implicit type casting is happening on EMPTY 
> strings.
> If excluding few columns in one of the SELECT statement is absolutely 
> necessary, then the only Workaround is to explicitly type cast the empty 
> strings to same Datatypes as the Other Select statement which included the 
> columns.
> For ex:
> Q1:
> select a,b,c from decisample3
> union all
> select a,cast(' ' as int),cast(' ' as decimal) from decisample;
> Q2:
> select a,cast(' ' as int),cast(' ' as decimal) from decisample
> union all
> select a,b,c from decisample3;
> Both the above queries will give consistent result now.
> cast(' ' as int) ---> this was cast to INT, same as datatype of "b"
> cast(' ' as decimal) ---> this was cast to decimal, same as datatype of "c"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to