[jira] [Commented] (HIVE-28598) Join by using two iceberg table may cause NPE

Stamatis Zampetakis (Jira) Thu, 31 Oct 2024 10:36:48 -0700


    [ 
https://issues.apache.org/jira/browse/HIVE-28598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17894662#comment-17894662
 ]


Stamatis Zampetakis commented on HIVE-28598:
--------------------------------------------

[~lisoda] Many thanks for submitting this issue and the valuable insights 
gathered so far. I think another element that could help if someone picks this 
ticket up and starts working on it is the query plans (EXPLAIN, EXPLAIN CBO, 
EXPLAIN VECTORIZATION DETAIL) when the problem occurs. 

At the moment, I have many stuff on my list so not sure I will have time to 
look into this in the next few weeks.

> Join by using two iceberg table may cause NPE
> ---------------------------------------------
>
>                 Key: HIVE-28598
>                 URL: https://issues.apache.org/jira/browse/HIVE-28598
>             Project: Hive
>          Issue Type: Bug
>      Security Level: Public(Viewable by anyone) 
>          Components: Iceberg integration, Query Processor
>    Affects Versions: 4.0.1
>            Reporter: yongzhi.shao
>            Priority: Major
>
> Currently, we have found that in some scenarios, join operations using two 
> iceberg tables may result in NPEs.
> ERROR-INFO:
> {code:java}
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row 
>         at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
>         at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
>         at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
>         at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
>         ... 16 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row 
>         at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:993)
>         at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
>         ... 19 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating 
> _col0
>         at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:149)
>         at 
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:931)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158)
>         at 
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.process(VectorFilterOperator.java:135)
>         at 
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919)
>         at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:171)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.deliverVectorizedRowBatch(VectorMapOperator.java:809)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:878)
>         ... 20 more
> Caused by: java.lang.NullPointerException
>         at org.apache.hive.common.util.Murmur3.hash32(Murmur3.java:144)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.MurmurHashStringColStringCol.hash(MurmurHashStringColStringCol.java:52)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.MurmurHashExpression.evaluate(MurmurHashExpression.java:97)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
>         ... 28 more {code}
> The steps to reproduce the problem are as follows:
> DataSet(about 18GB):
> download-url: 
> [http://data-plat.oss-cn-zhangjiakou.aliyuncs.com/shared/export.tar?OSSAccessKeyId=LTAI5tP5FMKsV3NgMbj4WLy1&Expires=1732875817&Signature=vV%2ByG%2FSYU2UKa5HO2btuiQoGY8c%3D]
> Please download the sample dataset I provided and import it into the T1_ORC, 
> T2_ORC tables. Then write the T1_ORC T2_ORC tables to the T1 T2 tables 
> respectively.
> (I'm sorry, I can't reproduce this problem with a very small dataset at the 
> moment, I've tried to reduce the size of the dataset samples as much as 
> possible)
>  
> INIT-SQL:
> {code:java}
> CREATE TABLE T1 
> (
>  ID  STRING,
>  ID2 STRING,
>  ID3 STRING 
> )STORED BY ICEBERG STORED AS ORC;
> CREATE TABLE T2 
> (
>  ID  STRING,
>  ID2 STRING 
> )STORED BY ICEBERG STORED AS ORC;
> CREATE TABLE T1_ORC
> (
>  ID  STRING,
>  ID2 STRING,
>  ID3 STRING 
> )STORED AS ORC;
> CREATE TABLE T2_ORC
> (
>  ID  STRING,
>  ID2 STRING 
> )STORED AS ORC; {code}
>  
> 1. When the bucket_version of the T1 table is different from that of the T2 
> table(EXAMPLE:T1=1 ,T2=2), running the SQL shown below will throw an error:
>  
> {code:java}
> select count(1)
> from 
> (select ID,ID2,ID3 from test.t1) t
> left join
> (select ID,ID2 from test.t2)  t2
> on t.ID = t2.ID and t.ID2 = t2.ID2; {code}
> 2.When the BUCKET_VERSION of the T1 and T2 tables are the 
> same(bucket_version=2), problem 1 disappears, but the following SQL still 
> throws an exception:
> {code:java}
> select count(1)
> from 
> (select ID,ID2 from test.t1 WHERE ID3='NORMAL') t
> left join
> (select ID,ID2 from test.t2)  t2
> on t.ID = t2.ID and t.ID2 = t2.ID2; {code}
>  
>  
> *When I replace the T1 T2 table with the T1_ORC T2_ORC table, the SQL 
> executes fine.*
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-28598) Join by using two iceberg table may cause NPE

Reply via email to