[ https://issues.apache.org/jira/browse/HIVE-28598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17894510#comment-17894510 ]
yongzhi.shao edited comment on HIVE-28598 at 11/4/24 7:55 AM: -------------------------------------------------------------- Through debugging(only for question 2), I've generally determined that there is something wrong with the vectorisation part of HIVE. When the size of the left table is large enough, the SQL above will initiate a pre-statistic to calculate some MIN/MAX/HASH values for the left table's associated key, and then pass it to another ICEBERG-TABLE-SCAN task via broadcast.For SCAN pushdown. At this point, if we add a filter condition to the left table, the entries that don't appear to meet the filter condition will exist as null values in the BytesColumnVector. However, for some other reason, such as execution planning, the noNull flag in the BytesColumnVector is set to true. The values in isNull[] are all false. Eventually, the original logic of MurmurHash and other functions for determining the value of a Null value fails.in functions such as MurmurHash.This eventually leads to NPE. As a proof, when I replace the expression in the where condition with a constant false expression, e.g. ID3<>'BAR' (there is no data in the dataset that matches this characteristic). The SQL is executed successfully. update: I modified the logic in the murmur3 hash method, adding a check for null values. Currently, although the SQL runs successfully, it is missing 3739 records compared to the correct result. It seems that there is an issue with the part that reads data in a vectorized manner. A small portion of the data that was originally not null is incorrectly stored as null in the vector[]. was (Author: lisoda): Through debugging(only for question 2), I've generally determined that there is something wrong with the vectorisation part of HIVE. When the size of the left table is large enough, the SQL above will initiate a pre-statistic to calculate some MIN/MAX/HASH values for the left table's associated key, and then pass it to another ICEBERG-TABLE-SCAN task via broadcast.For SCAN pushdown. At this point, if we add a filter condition to the left table, the entries that don't appear to meet the filter condition will exist as null values in the BytesColumnVector. However, for some other reason, such as execution planning, the noNull flag in the BytesColumnVector is set to true. The values in isNull[] are all false. Eventually, the original logic of MurmurHash and other functions for determining the value of a Null value fails.in functions such as MurmurHash.This eventually leads to NPE. As a proof, when I replace the expression in the where condition with a constant false expression, e.g. ID3<>'BAR' (there is no data in the dataset that matches this characteristic). The SQL is executed successfully. > Join by using two iceberg table may cause NPE > --------------------------------------------- > > Key: HIVE-28598 > URL: https://issues.apache.org/jira/browse/HIVE-28598 > Project: Hive > Issue Type: Bug > Security Level: Public(Viewable by anyone) > Components: Iceberg integration, Query Processor > Affects Versions: 4.0.1 > Reporter: yongzhi.shao > Priority: Major > > Currently, we have found that in some scenarios, join operations using two > iceberg tables may result in NPEs. > ERROR-INFO: > {code:java} > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293) > ... 16 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:993) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101) > ... 19 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating > _col0 > at > org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:149) > at > org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:931) > at > org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158) > at > org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919) > at > org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.process(VectorFilterOperator.java:135) > at > org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:171) > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.deliverVectorizedRowBatch(VectorMapOperator.java:809) > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:878) > ... 20 more > Caused by: java.lang.NullPointerException > at org.apache.hive.common.util.Murmur3.hash32(Murmur3.java:144) > at > org.apache.hadoop.hive.ql.exec.vector.expressions.MurmurHashStringColStringCol.hash(MurmurHashStringColStringCol.java:52) > at > org.apache.hadoop.hive.ql.exec.vector.expressions.MurmurHashExpression.evaluate(MurmurHashExpression.java:97) > at > org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146) > ... 28 more {code} > The steps to reproduce the problem are as follows: > DataSet(about 18GB): > download-url: > [http://data-plat.oss-cn-zhangjiakou.aliyuncs.com/shared/export.tar?OSSAccessKeyId=LTAI5tP5FMKsV3NgMbj4WLy1&Expires=1732875817&Signature=vV%2ByG%2FSYU2UKa5HO2btuiQoGY8c%3D] > Please download the sample dataset I provided and import it into the T1_ORC, > T2_ORC tables. Then write the T1_ORC T2_ORC tables to the T1 T2 tables > respectively. > (I'm sorry, I can't reproduce this problem with a very small dataset at the > moment, I've tried to reduce the size of the dataset samples as much as > possible) > > INIT-SQL: > {code:java} > CREATE TABLE T1 > ( > ID STRING, > ID2 STRING, > ID3 STRING > )STORED BY ICEBERG STORED AS ORC; > CREATE TABLE T2 > ( > ID STRING, > ID2 STRING > )STORED BY ICEBERG STORED AS ORC; > CREATE TABLE T1_ORC > ( > ID STRING, > ID2 STRING, > ID3 STRING > )STORED AS ORC; > CREATE TABLE T2_ORC > ( > ID STRING, > ID2 STRING > )STORED AS ORC; {code} > > 1. When the bucket_version of the T1 table is different from that of the T2 > table(EXAMPLE:T1=1 ,T2=2), running the SQL shown below will throw an error: > > {code:java} > select count(1) > from > (select ID,ID2,ID3 from test.t1) t > left join > (select ID,ID2 from test.t2) t2 > on t.ID = t2.ID and t.ID2 = t2.ID2; {code} > 2.When the BUCKET_VERSION of the T1 and T2 tables are the > same(bucket_version=2), problem 1 disappears, but the following SQL still > throws an exception: > {code:java} > select count(1) > from > (select ID,ID2 from test.t1 WHERE ID3='NORMAL') t > left join > (select ID,ID2 from test.t2) t2 > on t.ID = t2.ID and t.ID2 = t2.ID2; {code} > > > *When I replace the T1 T2 table with the T1_ORC T2_ORC table, the SQL > executes fine.* > -- This message was sent by Atlassian Jira (v8.20.10#820010)