Zoltán Borók-Nagy created IMPALA-14797:
------------------------------------------
Summary: Apply partition key scan optimization for more cases for
Iceberg tables
Key: IMPALA-14797
URL: https://issues.apache.org/jira/browse/IMPALA-14797
Project: IMPALA
Issue Type: Improvement
Reporter: Zoltán Borók-Nagy
We do partition key scan optimization for Iceberg tables, if the partition
columns use IDENTITY-transform in all partition spec.
This disables the optimization even if all data files use partition specs that
are eligible (the partition column being used is IDENTITY-transformed in all
specs), but there are inactive partition specs that are not eligible.
Also, if some data files use partition specs that are eligible, but some data
files use partition specs that aren't, we could still do the optimization
partially by grouping the data files:
* files eligible for partition key scan optimization
* files not eligible for partition key scan optimization
Then we could do the following plan:
{noformat}
UNION ALL
/ | \
/ | \
/ | \
PARTITION SCAN ICEBERG
KEY WITHOUT DELETE
SCAN DELETES NODE
/ \
/ \
SCAN SCAN
data delete
files files{noformat}
E.g.:
{noformat}
CREATE TABLE ice_t (i int, j int)
PARTITIONED BY SPEC (i)
STORED BY ICEBERG;
-- Insert files eligible for partition key scan opt.
INSERT ...
-- Insert files that are NOT eligible for partition key scan opt.
ALTER TABLE ice_t SET PARTITION SPEC (truncate(i, 100));
INSERT ...
-- Query could use partition key scan optimization partially:
SELECT distinct(i) FROM ice_t;{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]