[jira] [Comment Edited] (HIVE-28598) Join by using two iceberg table may cause NPE

yongzhi.shao (Jira) Sun, 03 Nov 2024 23:56:06 -0800


    [ 
https://issues.apache.org/jira/browse/HIVE-28598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17894510#comment-17894510
 ]


yongzhi.shao edited comment on HIVE-28598 at 11/4/24 7:55 AM:
--------------------------------------------------------------

Through debugging(only for question 2), I've generally determined that there is 
something wrong with the vectorisation part of HIVE.
When the size of the left table is large enough, the SQL above will initiate a 
pre-statistic to calculate some MIN/MAX/HASH values for the left table's 
associated key, and then pass it to another ICEBERG-TABLE-SCAN task via 
broadcast.For SCAN pushdown.
At this point, if we add a filter condition to the left table, the entries that 
don't appear to meet the filter condition will exist as null values in the 
BytesColumnVector.
However, for some other reason, such as execution planning, the noNull flag in 
the BytesColumnVector is set to true. The values in isNull[] are all false. 
Eventually, the original logic of MurmurHash and other functions for 
determining the value of a Null value fails.in functions such as 
MurmurHash.This eventually leads to NPE.

As a proof, when I replace the expression in the where condition with a 
constant false expression, e.g. ID3<>'BAR' (there is no data in the dataset 
that matches this characteristic).  The SQL is executed successfully.

 

 

update:

I modified the logic in the murmur3 hash method, adding a check for null 
values. Currently, although the SQL runs successfully, it is missing 3739 
records compared to the correct result. It seems that there is an issue with 
the part that reads data in a vectorized manner. A small portion of the data 
that was originally not null is incorrectly stored as null in the vector[].


was (Author: lisoda):
Through debugging(only for question 2), I've generally determined that there is 
something wrong with the vectorisation part of HIVE.
When the size of the left table is large enough, the SQL above will initiate a 
pre-statistic to calculate some MIN/MAX/HASH values for the left table's 
associated key, and then pass it to another ICEBERG-TABLE-SCAN task via 
broadcast.For SCAN pushdown.
At this point, if we add a filter condition to the left table, the entries that 
don't appear to meet the filter condition will exist as null values in the 
BytesColumnVector.
However, for some other reason, such as execution planning, the noNull flag in 
the BytesColumnVector is set to true. The values in isNull[] are all false. 
Eventually, the original logic of MurmurHash and other functions for 
determining the value of a Null value fails.in functions such as 
MurmurHash.This eventually leads to NPE.

As a proof, when I replace the expression in the where condition with a 
constant false expression, e.g. ID3<>'BAR' (there is no data in the dataset 
that matches this characteristic).  The SQL is executed successfully.

> Join by using two iceberg table may cause NPE
> ---------------------------------------------
>
>                 Key: HIVE-28598
>                 URL: https://issues.apache.org/jira/browse/HIVE-28598
>             Project: Hive
>          Issue Type: Bug
>      Security Level: Public(Viewable by anyone) 
>          Components: Iceberg integration, Query Processor
>    Affects Versions: 4.0.1
>            Reporter: yongzhi.shao
>            Priority: Major
>
> Currently, we have found that in some scenarios, join operations using two 
> iceberg tables may result in NPEs.
> ERROR-INFO:
> {code:java}
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row 
>         at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
>         at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
>         at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
>         at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
>         ... 16 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row 
>         at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:993)
>         at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
>         ... 19 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating 
> _col0
>         at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:149)
>         at 
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:931)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158)
>         at 
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.process(VectorFilterOperator.java:135)
>         at 
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919)
>         at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:171)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.deliverVectorizedRowBatch(VectorMapOperator.java:809)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:878)
>         ... 20 more
> Caused by: java.lang.NullPointerException
>         at org.apache.hive.common.util.Murmur3.hash32(Murmur3.java:144)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.MurmurHashStringColStringCol.hash(MurmurHashStringColStringCol.java:52)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.MurmurHashExpression.evaluate(MurmurHashExpression.java:97)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
>         ... 28 more {code}
> The steps to reproduce the problem are as follows:
> DataSet(about 18GB):
> download-url: 
> [http://data-plat.oss-cn-zhangjiakou.aliyuncs.com/shared/export.tar?OSSAccessKeyId=LTAI5tP5FMKsV3NgMbj4WLy1&Expires=1732875817&Signature=vV%2ByG%2FSYU2UKa5HO2btuiQoGY8c%3D]
> Please download the sample dataset I provided and import it into the T1_ORC, 
> T2_ORC tables. Then write the T1_ORC T2_ORC tables to the T1 T2 tables 
> respectively.
> (I'm sorry, I can't reproduce this problem with a very small dataset at the 
> moment, I've tried to reduce the size of the dataset samples as much as 
> possible)
>  
> INIT-SQL:
> {code:java}
> CREATE TABLE T1 
> (
>  ID  STRING,
>  ID2 STRING,
>  ID3 STRING 
> )STORED BY ICEBERG STORED AS ORC;
> CREATE TABLE T2 
> (
>  ID  STRING,
>  ID2 STRING 
> )STORED BY ICEBERG STORED AS ORC;
> CREATE TABLE T1_ORC
> (
>  ID  STRING,
>  ID2 STRING,
>  ID3 STRING 
> )STORED AS ORC;
> CREATE TABLE T2_ORC
> (
>  ID  STRING,
>  ID2 STRING 
> )STORED AS ORC; {code}
>  
> 1. When the bucket_version of the T1 table is different from that of the T2 
> table(EXAMPLE:T1=1 ,T2=2), running the SQL shown below will throw an error:
>  
> {code:java}
> select count(1)
> from 
> (select ID,ID2,ID3 from test.t1) t
> left join
> (select ID,ID2 from test.t2)  t2
> on t.ID = t2.ID and t.ID2 = t2.ID2; {code}
> 2.When the BUCKET_VERSION of the T1 and T2 tables are the 
> same(bucket_version=2), problem 1 disappears, but the following SQL still 
> throws an exception:
> {code:java}
> select count(1)
> from 
> (select ID,ID2 from test.t1 WHERE ID3='NORMAL') t
> left join
> (select ID,ID2 from test.t2)  t2
> on t.ID = t2.ID and t.ID2 = t2.ID2; {code}
>  
>  
> *When I replace the T1 T2 table with the T1_ORC T2_ORC table, the SQL 
> executes fine.*
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (HIVE-28598) Join by using two iceberg table may cause NPE

Reply via email to