[ https://issues.apache.org/jira/browse/HIVE-17355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aditya Allamraju reassigned HIVE-17355: --------------------------------------- Assignee: Aditya Allamraju > Casting to Decimal along with UNION ALL gives incosistent results > ----------------------------------------------------------------- > > Key: HIVE-17355 > URL: https://issues.apache.org/jira/browse/HIVE-17355 > Project: Hive > Issue Type: Bug > Components: Parser, UDF > Affects Versions: 2.1.0, 2.1.1 > Environment: CentOS 7.2 > Reporter: Aditya Allamraju > Assignee: Aditya Allamraju > > Extra trailing zeros are added when running "union all" on the tables > containing decimal data types. > Version: Hive 2.1 > Steps to repro:- > ================ > 1) CREATE TABLE `decisample`( > `a` decimal(8,2), > `b` int, > `c` decimal(5,2)) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.mapred.TextInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > LOCATION > 'maprfs:/user/hive/warehouse/decisample' > 2) CREATE TABLE `decisample3`( > `a` decimal(8,2), > `b` int, > `c` decimal(5,2)) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.mapred.TextInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > LOCATION > 'maprfs:/user/hive/warehouse/decisample3' > 3)hive> select * from decisample3; > OK > 1.00 2 3.00 > 7.00 8 9.00 > 4)hive> select * from decisample; > OK > 4.00 5 6.00 > 5) query:- > select a1.a, '' as a1b,'' as a1c from decisample a1 union all select > a2.a,a2.b,a2.c from decisample3 a2; > o/p:- > OK > 4.00 NULL > 1.00 2 3.000000000000000000 > 7.00 8 9.000000000000000000 > Time taken: 87.993 seconds, Fetched: 3 row(s) > 6)select a2.a,a2.b,a2.c from decisample3 a2 union all select a1.a, '' as > a1b,'' as a1c from decisample a1; > o/p:- > 4.00 > 1.00 2 3 > 7.00 8 9 > Steps 5 is yielding 18 trailing zeros where as step 6 query is yieldings no > trailing zero. > Observation: > 1. Hive is trying to run the UNION ALL after ensuring the SELECT's are > semantically same(equal number of columns and same datatypes). To do this, it > is implicitly type casting the values where required. > From the explain plan, type casting is not consistent when done 2 different > ways: > a) select-1 UNION ALL select-2 (Query-5 in above comment) > vs > b) select-2 UNION ALL select-2 (Query-6 in above comment) > Showing only the "expresssions" part of execution plans > Query-5: > ======== > .. > .. > Map Operator Tree: > TableScan > alias: a1 > Statistics: Num rows: 1 Data size: 11 Basic stats: COMPLETE > Column stats: NONE > Select Operator > expressions: a (type: decimal(8,2)), '' (type: string), null > (type: decimal(38,18)) > outputColumnNames: _col0, _col1, _col2 > .. > .. > TableScan > alias: a2 > Statistics: Num rows: 2 Data size: 22 Basic stats: COMPLETE > Column stats: NONE > Select Operator > expressions: a (type: decimal(8,2)), UDFToString(b) (type: > string), CAST( c AS decimal(38,18)) (type: decimal(38,18)) > Query-6: > ======== > .. > .. > Map Operator Tree: > TableScan > alias: a2 > Statistics: Num rows: 2 Data size: 22 Basic stats: COMPLETE > Column stats: NONE > Select Operator > expressions: a (type: decimal(8,2)), UDFToString(b) (type: > string), UDFToString(c) (type: string) > .. > .. > TableScan > alias: a1 > Statistics: Num rows: 1 Data size: 11 Basic stats: COMPLETE > Column stats: NONE > Select Operator > expressions: a (type: decimal(8,2)), '' (type: string), '' > (type: string) > .. > .. > Attaching the execution plans for both queries for reference. > 2. The reason for 18 zeros in query-5 above is due to casting NULL to Decimal. > And by default, the precision and scale are taken as (38,18) in Hive. This > could be the reason for 18 zeros. > 3. This is repeating every time implicit type casting is happening on EMPTY > strings. > If excluding few columns in one of the SELECT statement is absolutely > necessary, then the only Workaround is to explicitly type cast the empty > strings to same Datatypes as the Other Select statement which included the > columns. > For ex: > Q1: > select a,b,c from decisample3 > union all > select a,cast(' ' as int),cast(' ' as decimal) from decisample; > Q2: > select a,cast(' ' as int),cast(' ' as decimal) from decisample > union all > select a,b,c from decisample3; > Both the above queries will give consistent result now. > cast(' ' as int) ---> this was cast to INT, same as datatype of "b" > cast(' ' as decimal) ---> this was cast to decimal, same as datatype of "c" -- This message was sent by Atlassian JIRA (v6.4.14#64029)