[ 
https://issues.apache.org/jira/browse/HIVE-14715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15478254#comment-15478254
 ] 

Yongzhi Chen commented on HIVE-14715:
-------------------------------------

[~ashutoshc],
In cbo mode, the reduce output operator treat each null as different cols 
(known from the query plan), so there is no column removed in reduce keys, 
therefore it works. Following is the plan(partial) from cbo:
{noformat}
                 Group By Operator                  
                   aggregations: sum(_col6)         
                   keys: _col0 (type: int), null (type: void), null (type: 
void), _col3 (type: string), null (type: void), null (type: void) 
                   mode: hash                       
                   outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, 
_col6 
                   Statistics: Num rows: 4 Data size: 1019 Basic stats: 
COMPLETE Column stats: NONE 
                   Reduce Output Operator           
                     key expressions: _col0 (type: int), _col1 (type: void), 
_col2 (type: void), _col3 (type: string), _col4 (type: void), _col5 (type: 
void) 
                     sort order: ++++++             
                     Map-reduce partition columns: _col0 (type: int), _col1 
(type: void), _col2 (type: void), _col3 (type: string), _col4 (type: void), 
_col5 (type: void) 
                     Statistics: Num rows: 4 Data size: 1019 Basic stats: 
COMPLETE Column stats: NONE 
                     value expressions: _col6 (type: bigint) 
{noformat}

And following is the corresponding plan without fix for non-cbo mode:
{noformat}

                 Group By Operator                  
                   aggregations: sum(bn1)           
                   keys: 'Pricing mismatch' (type: string), c1 (type: int), 
null (type: void), null (type: void), s2 (type: string), null (type: void), 
null (type: void) 
                   mode: hash                       
                   outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, 
_col6, _col7 
                   Statistics: Num rows: 4 Data size: 1019 Basic stats: 
COMPLETE Column stats: NONE 
                   Reduce Output Operator           
                     key expressions: 'Pricing mismatch' (type: string), _col1 
(type: int), null (type: void), _col4 (type: string) 
                     sort order: ++++               
                     Map-reduce partition columns: 'Pricing mismatch' (type: 
string), _col1 (type: int), null (type: void), _col4 (type: string) 
                     Statistics: Num rows: 4 Data size: 1019 Basic stats: 
COMPLETE Column stats: NONE 
                     value expressions: _col4 (type: string) 
{noformat}
You can see _col4 is wrong, it should be _col7.

> Hive throws NumberFormatException with query with Null value
> ------------------------------------------------------------
>
>                 Key: HIVE-14715
>                 URL: https://issues.apache.org/jira/browse/HIVE-14715
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Yongzhi Chen
>            Assignee: Yongzhi Chen
>         Attachments: HIVE-14715.1.patch, HIVE-14715.2.patch
>
>
> The java.lang.NumberFormatException will throw with following reproduce:
> set hive.cbo.enable=false;
> CREATE TABLE `paqtest`(
> `c1` int,
> `s1` string,
> `s2` string,
> `bn1` bigint)
> ROW FORMAT SERDE
> 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
> STORED AS INPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
> OUTPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';
> insert into paqtest values (58, '', 'ABC', 0);
> SELECT
> 'PM' AS cy,
> c1,
> NULL AS iused,
> NULL AS itp,
> s2,
> NULL AS cvg,
> NULL AS acavg,
> sum(bn1) AS cca
> FROM paqtest
> WHERE (s1 IS NULL OR length(s1) = 0)
> GROUP BY 'Pricing mismatch', c1, NULL, NULL, s2, NULL, NULL;
> The stack like following:
> java.lang.NumberFormatException: ABC
> GroupByOperator.process(Object, int) line: 773        
> ExecReducer.reduce(Object, Iterator, OutputCollector, Reporter) line: 236     
> ReduceTask.runOldReducer(JobConf, TaskUmbilicalProtocol, TaskReporter, 
> RawKeyValueIterator, RawComparator<INKEY>, Class<INKEY>, Class<INVALUE>) 
> line: 444     
> ReduceTask.run(JobConf, TaskUmbilicalProtocol) line: 392      
> LocalJobRunner$Job$ReduceTaskRunnable.run() line: 319 
> Executors$RunnableAdapter<T>.call() line: 471 
> It works fine when hive.cbo.enable = true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to