[jira] [Created] (IMPALA-14797) Apply partition key scan optimization for more cases for Iceberg tables

Jira Tue, 03 Mar 2026 06:26:12 -0800

Zoltán Borók-Nagy created IMPALA-14797:
------------------------------------------


             Summary: Apply partition key scan optimization for more cases for 
Iceberg tables
                 Key: IMPALA-14797
                 URL: https://issues.apache.org/jira/browse/IMPALA-14797
             Project: IMPALA
          Issue Type: Improvement
            Reporter: Zoltán Borók-Nagy


We do partition key scan optimization for Iceberg tables, if the partition 
columns use IDENTITY-transform in all partition spec.

This disables the optimization even if all data files use partition specs that 
are eligible (the partition column being used is IDENTITY-transformed in all 
specs), but there are inactive partition specs that are not eligible.

Also, if some data files use partition specs that are eligible, but some data 
files use partition specs that aren't, we could still do the optimization 
partially by grouping the data files:
 * files eligible for partition key scan optimization
 * files not eligible for partition key scan optimization

Then we could do the following plan: 
{noformat}
                UNION  ALL
             /       |      \
           /         |        \
         /           |          \
    PARTITION     SCAN         ICEBERG
    KEY          WITHOUT       DELETE
    SCAN         DELETES        NODE
                                /  \
                               /    \
                             SCAN   SCAN
                             data   delete
                             files  files{noformat}
E.g.:
{noformat}
CREATE TABLE ice_t (i int, j int)
PARTITIONED BY SPEC (i)
STORED BY ICEBERG;

-- Insert files eligible for partition key scan opt.
INSERT ...

-- Insert files that are NOT eligible for partition key scan opt.
ALTER TABLE ice_t SET PARTITION SPEC (truncate(i, 100));
INSERT ...

-- Query could use partition key scan optimization partially:
SELECT distinct(i) FROM ice_t;{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (IMPALA-14797) Apply partition key scan optimization for more cases for Iceberg tables

Reply via email to