[ https://issues.apache.org/jira/browse/HIVE-28598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
yongzhi.shao updated HIVE-28598: -------------------------------- Description: Currently, we have found that in some scenarios, join operations using two iceberg tables may result in NPEs. ERROR-INFO: {code:java} Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293) ... 16 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:993) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101) ... 19 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating _col0 at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:149) at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:931) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158) at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919) at org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.process(VectorFilterOperator.java:135) at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:171) at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.deliverVectorizedRowBatch(VectorMapOperator.java:809) at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:878) ... 20 more Caused by: java.lang.NullPointerException at org.apache.hive.common.util.Murmur3.hash32(Murmur3.java:144) at org.apache.hadoop.hive.ql.exec.vector.expressions.MurmurHashStringColStringCol.hash(MurmurHashStringColStringCol.java:52) at org.apache.hadoop.hive.ql.exec.vector.expressions.MurmurHashExpression.evaluate(MurmurHashExpression.java:97) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146) ... 28 more {code} The steps to reproduce the problem are as follows: INIT-SQL: {code:java} CREATE TABLE T1 ( ID STRING, ID2 STRING, ID3 STRING )STORED BY ICEBERG STORED AS ORC; CREATE TABLE T2 ( ID STRING, ID2 STRING )STORED BY ICEBERG STORED AS ORC; CREATE TABLE T1_ORC ( ID STRING, ID2 STRING, ID3 STRING )STORED AS ORC; CREATE TABLE T2_ORC ( ID STRING, ID2 STRING )STORED AS ORC; {code} 1. When the bucket_version of the T1 table is different from that of the T2 table, running the SQL shown below will throw an error: {code:java} select count(1) from (select ID,ID2,ID3 from test.t1) t left join (select ID,ID2 from test.t2) t2 on t.ID = t2.ID and t.ID2 = t2.ID2; {code} 2.When the BUCKET_VERSION of the T1 and T2 tables are the same, problem 1 disappears, but the following SQL still throws an exception: {code:java} select count(1) from (select ID,ID2 from test.t1 WHERE ID3='NORMAL') t left join (select ID,ID2 from test.t2) t2 on t.ID = t2.ID and t.ID2 = t2.ID2; {code} When I replace the T1 T2 table with the T1_ORC T2_ORC table, the SQL executes fine. was: Currently, we have found that in some scenarios, join operations using two iceberg tables may result in NPEs. INIT-SQL: {code:java} CREATE TABLE T1 ( ID STRING, ID2 STRING, ID3 STRING )STORED BY ICEBERG STORED AS ORC; CREATE TABLE T2 ( ID STRING, ID2 STRING )STORED BY ICEBERG STORED AS ORC; CREATE TABLE T1_ORC ( ID STRING, ID2 STRING, ID3 STRING )STORED AS ORC; CREATE TABLE T2_ORC ( ID STRING, ID2 STRING )STORED AS ORC; {code} 1. When the bucket_version of the T1 table is different from that of the T2 table, running the SQL shown below will throw an error: {code:java} select count(1) from (select ID,ID2,ID3 from test.t1) t left join (select ID,ID2 from test.t2) t2 on t.ID = t2.ID and t.ID2 = t2.ID2; {code} 2.When the BUCKET_VERSION of the T1 and T2 tables are the same, problem 1 disappears, but the following SQL still throws an exception: {code:java} select count(1) from (select ID,ID2 from test.t1 WHERE ID3='NORMAL') t left join (select ID,ID2 from test.t2) t2 on t.ID = t2.ID and t.ID2 = t2.ID2; {code} When I replace the T1 T2 table with the T1_ORC T2_ORC table, the SQL executes fine. > Join by using two iceberg table may cause NPE > --------------------------------------------- > > Key: HIVE-28598 > URL: https://issues.apache.org/jira/browse/HIVE-28598 > Project: Hive > Issue Type: Bug > Security Level: Public(Viewable by anyone) > Components: Iceberg integration, Query Processor > Affects Versions: 4.0.1 > Reporter: yongzhi.shao > Priority: Major > > Currently, we have found that in some scenarios, join operations using two > iceberg tables may result in NPEs. > ERROR-INFO: > {code:java} > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293) > ... 16 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:993) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101) > ... 19 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating > _col0 > at > org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:149) > at > org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:931) > at > org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158) > at > org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919) > at > org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.process(VectorFilterOperator.java:135) > at > org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:171) > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.deliverVectorizedRowBatch(VectorMapOperator.java:809) > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:878) > ... 20 more > Caused by: java.lang.NullPointerException > at org.apache.hive.common.util.Murmur3.hash32(Murmur3.java:144) > at > org.apache.hadoop.hive.ql.exec.vector.expressions.MurmurHashStringColStringCol.hash(MurmurHashStringColStringCol.java:52) > at > org.apache.hadoop.hive.ql.exec.vector.expressions.MurmurHashExpression.evaluate(MurmurHashExpression.java:97) > at > org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146) > ... 28 more {code} > The steps to reproduce the problem are as follows: > INIT-SQL: > {code:java} > CREATE TABLE T1 > ( > ID STRING, > ID2 STRING, > ID3 STRING > )STORED BY ICEBERG STORED AS ORC; > CREATE TABLE T2 > ( > ID STRING, > ID2 STRING > )STORED BY ICEBERG STORED AS ORC; > CREATE TABLE T1_ORC > ( > ID STRING, > ID2 STRING, > ID3 STRING > )STORED AS ORC; > CREATE TABLE T2_ORC > ( > ID STRING, > ID2 STRING > )STORED AS ORC; {code} > > > 1. When the bucket_version of the T1 table is different from that of the T2 > table, running the SQL shown below will throw an error: > > {code:java} > select count(1) > from > (select ID,ID2,ID3 from test.t1) t > left join > (select ID,ID2 from test.t2) t2 > on t.ID = t2.ID and t.ID2 = t2.ID2; {code} > 2.When the BUCKET_VERSION of the T1 and T2 tables are the same, problem 1 > disappears, but the following SQL still throws an exception: > {code:java} > select count(1) > from > (select ID,ID2 from test.t1 WHERE ID3='NORMAL') t > left join > (select ID,ID2 from test.t2) t2 > on t.ID = t2.ID and t.ID2 = t2.ID2; {code} > When I replace the T1 T2 table with the T1_ORC T2_ORC table, the SQL executes > fine. > -- This message was sent by Atlassian Jira (v8.20.10#820010)