Fang-Yu Rao created IMPALA-14116:
------------------------------------
Summary: Consider erroring out earlier if NULL is on the IN-list
of a table scan against an ORC table
Key: IMPALA-14116
URL: https://issues.apache.org/jira/browse/IMPALA-14116
Project: IMPALA
Issue Type: Improvement
Reporter: Fang-Yu Rao
Assignee: Fang-Yu Rao
Attachments: resolved_crashed_thread.txt
We found that currently if we include NULL on the IN-list of a table scan
against an ORC table, Impala daemons could crash. This could be reproduced in
the following.
# Create the database and an ORC table under the database in impala-shell.
{code}
create database test_db_04;
CREATE EXTERNAL TABLE test_db_04.test_tbl_01 (customer_id STRING)
PARTITIONED BY (ingest_date STRING)
WITH SERDEPROPERTIES ('serialization.format'='1')
STORED AS ORC;
{code}
# Insert a row into the ORC table just created via beeline.
{code}
INSERT INTO test_db_04.test_tbl_01 partition (ingest_date='2025-05-29') values
('CUST001');
{code}
# Execute the following query via impala-shell.
{code}
SELECT ingest_date, customer_id
FROM test_db_04.test_tbl_01 WHERE ingest_date > DATE '2024-09-30' AND
customer_id IN ('', NULL)
GROUP BY 1, 2;
{code}
An Impala daemon would crash during the execution of the ORC table scan. The
stack trace of the crashed thread in the resolved minidump is also provided in
[^resolved_crashed_thread.txt].
We should consider erroring out earlier if NULL is on the IN-list of a table
scan against an ORC table to prevent any Impala daemon from crashing, maybe
during the query analysis.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]