[
https://issues.apache.org/jira/browse/HIVE-28598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17898553#comment-17898553
]
yongzhi.shao edited comment on HIVE-28598 at 11/15/24 10:06 AM:
----------------------------------------------------------------
[~seonggon]
hello.
Let me try.
Incidentally, was your final enquiry correct?
Also, it seems that in some scenarios, the bloom_filter() method may also throw
an NPE exception.So I guess we might be better off fixing this in vector[][]...
was (Author: lisoda):
[~seonggon]
hello.
let me try.
Incidentally, was your final enquiry correct?
> Join by using two iceberg table may cause NPE
> ---------------------------------------------
>
> Key: HIVE-28598
> URL: https://issues.apache.org/jira/browse/HIVE-28598
> Project: Hive
> Issue Type: Bug
> Security Level: Public(Viewable by anyone)
> Components: Iceberg integration, Query Processor
> Affects Versions: 4.0.1
> Reporter: yongzhi.shao
> Priority: Major
> Labels: pull-request-available
> Attachments: EXPLAIN VECTORIZATION DETAIL.txt, EXPLAIN.txt, cbo
> plan.txt, image-2024-11-15-11-11-10-382.png
>
>
> Currently, we have found that in some scenarios, join operations using two
> iceberg tables may result in NPEs.
> ERROR-INFO:
> {code:java}
> Caused by: java.lang.RuntimeException:
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
> processing row
> at
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
> at
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
> at
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
> ... 16 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
> Error while processing row
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:993)
> at
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
> ... 19 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating
> _col0
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:149)
> at
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:931)
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158)
> at
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919)
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.process(VectorFilterOperator.java:135)
> at
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919)
> at
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:171)
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.deliverVectorizedRowBatch(VectorMapOperator.java:809)
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:878)
> ... 20 more
> Caused by: java.lang.NullPointerException
> at org.apache.hive.common.util.Murmur3.hash32(Murmur3.java:144)
> at
> org.apache.hadoop.hive.ql.exec.vector.expressions.MurmurHashStringColStringCol.hash(MurmurHashStringColStringCol.java:52)
> at
> org.apache.hadoop.hive.ql.exec.vector.expressions.MurmurHashExpression.evaluate(MurmurHashExpression.java:97)
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
> ... 28 more {code}
> The steps to reproduce the problem are as follows:
> DataSet(about 18GB):
> download-url:
> [http://data-plat.oss-cn-zhangjiakou.aliyuncs.com/shared/export.tar?OSSAccessKeyId=LTAI5tP5FMKsV3NgMbj4WLy1&Expires=1732875817&Signature=vV%2ByG%2FSYU2UKa5HO2btuiQoGY8c%3D]
> Please download the sample dataset I provided and import it into the T1_ORC,
> T2_ORC tables. Then write the T1_ORC T2_ORC tables to the T1 T2 tables
> respectively.
> (I'm sorry, I can't reproduce this problem with a very small dataset at the
> moment, I've tried to reduce the size of the dataset samples as much as
> possible)
>
> INIT-SQL:
> {code:java}
> CREATE TABLE T1
> (
> ID STRING,
> ID2 STRING,
> ID3 STRING
> )STORED BY ICEBERG STORED AS ORC;
> CREATE TABLE T2
> (
> ID STRING,
> ID2 STRING
> )STORED BY ICEBERG STORED AS ORC;
> CREATE TABLE T1_ORC
> (
> ID STRING,
> ID2 STRING,
> ID3 STRING
> )STORED AS ORC;
> CREATE TABLE T2_ORC
> (
> ID STRING,
> ID2 STRING
> )STORED AS ORC; {code}
>
> 1. When the bucket_version of the T1 table is different from that of the T2
> table(EXAMPLE:T1=1 ,T2=2), running the SQL shown below will throw an error:
>
> {code:java}
> select count(1)
> from
> (select ID,ID2,ID3 from test.t1) t
> left join
> (select ID,ID2 from test.t2) t2
> on t.ID = t2.ID and t.ID2 = t2.ID2; {code}
> 2.When the BUCKET_VERSION of the T1 and T2 tables are the
> same(bucket_version=2), problem 1 disappears, but the following SQL still
> throws an exception:
> {code:java}
> select count(1)
> from
> (select ID,ID2 from test.t1 WHERE ID3='NORMAL') t
> left join
> (select ID,ID2 from test.t2) t2
> on t.ID = t2.ID and t.ID2 = t2.ID2; {code}
>
>
> *When I replace the T1 T2 table with the T1_ORC T2_ORC table, the SQL
> executes fine.*
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)