[ https://issues.apache.org/jira/browse/HIVE-14715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15478254#comment-15478254 ]
Yongzhi Chen commented on HIVE-14715: ------------------------------------- [~ashutoshc], In cbo mode, the reduce output operator treat each null as different cols (known from the query plan), so there is no column removed in reduce keys, therefore it works. Following is the plan(partial) from cbo: {noformat} Group By Operator aggregations: sum(_col6) keys: _col0 (type: int), null (type: void), null (type: void), _col3 (type: string), null (type: void), null (type: void) mode: hash outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6 Statistics: Num rows: 4 Data size: 1019 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: int), _col1 (type: void), _col2 (type: void), _col3 (type: string), _col4 (type: void), _col5 (type: void) sort order: ++++++ Map-reduce partition columns: _col0 (type: int), _col1 (type: void), _col2 (type: void), _col3 (type: string), _col4 (type: void), _col5 (type: void) Statistics: Num rows: 4 Data size: 1019 Basic stats: COMPLETE Column stats: NONE value expressions: _col6 (type: bigint) {noformat} And following is the corresponding plan without fix for non-cbo mode: {noformat} Group By Operator aggregations: sum(bn1) keys: 'Pricing mismatch' (type: string), c1 (type: int), null (type: void), null (type: void), s2 (type: string), null (type: void), null (type: void) mode: hash outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7 Statistics: Num rows: 4 Data size: 1019 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: 'Pricing mismatch' (type: string), _col1 (type: int), null (type: void), _col4 (type: string) sort order: ++++ Map-reduce partition columns: 'Pricing mismatch' (type: string), _col1 (type: int), null (type: void), _col4 (type: string) Statistics: Num rows: 4 Data size: 1019 Basic stats: COMPLETE Column stats: NONE value expressions: _col4 (type: string) {noformat} You can see _col4 is wrong, it should be _col7. > Hive throws NumberFormatException with query with Null value > ------------------------------------------------------------ > > Key: HIVE-14715 > URL: https://issues.apache.org/jira/browse/HIVE-14715 > Project: Hive > Issue Type: Bug > Reporter: Yongzhi Chen > Assignee: Yongzhi Chen > Attachments: HIVE-14715.1.patch, HIVE-14715.2.patch > > > The java.lang.NumberFormatException will throw with following reproduce: > set hive.cbo.enable=false; > CREATE TABLE `paqtest`( > `c1` int, > `s1` string, > `s2` string, > `bn1` bigint) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'; > insert into paqtest values (58, '', 'ABC', 0); > SELECT > 'PM' AS cy, > c1, > NULL AS iused, > NULL AS itp, > s2, > NULL AS cvg, > NULL AS acavg, > sum(bn1) AS cca > FROM paqtest > WHERE (s1 IS NULL OR length(s1) = 0) > GROUP BY 'Pricing mismatch', c1, NULL, NULL, s2, NULL, NULL; > The stack like following: > java.lang.NumberFormatException: ABC > GroupByOperator.process(Object, int) line: 773 > ExecReducer.reduce(Object, Iterator, OutputCollector, Reporter) line: 236 > ReduceTask.runOldReducer(JobConf, TaskUmbilicalProtocol, TaskReporter, > RawKeyValueIterator, RawComparator<INKEY>, Class<INKEY>, Class<INVALUE>) > line: 444 > ReduceTask.run(JobConf, TaskUmbilicalProtocol) line: 392 > LocalJobRunner$Job$ReduceTaskRunnable.run() line: 319 > Executors$RunnableAdapter<T>.call() line: 471 > It works fine when hive.cbo.enable = true -- This message was sent by Atlassian JIRA (v6.3.4#6332)