[ 
https://issues.apache.org/jira/browse/IMPALA-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17986202#comment-17986202
 ] 

ASF subversion and git services commented on IMPALA-11591:
----------------------------------------------------------

Commit 1d640905912944ea05deaa3453cb6a85013b2e54 in impala's branch 
refs/heads/master from Csaba Ringhofer
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=1d6409059 ]

IMPALA-14123: Allow forcing predicate push down to Iceberg

Since IMPALA-11591 Impala tries to avoid  pushing down predicates to
Iceberg unless it is necessary (timetravel) or is likely to be useful
(at least 1 partition column is involved in predicates). While this
makes planning faster, it may miss opportunities to skip files during
planning.

This patch adds table property impala.iceberg.push_down_hint that
expects a comma separated list of column names and leads to push
down to Iceberg when there is a predicate on any of these columns.
Users can set this manually, while in the future Impala or other tools
may be able to set it automatically, e.g. during COMPUTE STATS if
there are many files with non-overlapping min/max stats for a given
column.

Note that in most cases when Iceberg can skip files the Parquet/ORC
scanner would also skip most of the data based on stat filtering. The
benefit of doing it during planning is reading less footers and a
"smaller" query plan.

Change-Id: I8eb4ab5204c20b3991fdf305d7317f4023904a0f
Reviewed-on: http://gerrit.cloudera.org:8080/22995
Reviewed-by: Csaba Ringhofer <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Avoid calling planFiles() on Iceberg tables when there are no predicates
> ------------------------------------------------------------------------
>
>                 Key: IMPALA-11591
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11591
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog, Frontend
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Zoltán Borók-Nagy
>            Priority: Major
>              Labels: impala-iceberg
>             Fix For: Impala 4.2.0
>
>
> Currently we always invoke Iceberg's planFiles() API for creating Iceberg 
> scans.
> When there are no predicates (and no time travel) on the table we could avoid 
> that because we already cache everything we need (schema, partition 
> information, file descriptors).
> We can also consider only pushing down predicates if at least one of the 
> predicates refer to a partition column. Otherwise it's possible that the 
> overhead of reading, decoding, evaluating all the manifest files is too large.
> I think the change should be fairly simple, we just need to take care:
>  * -store delete files separately, so we can still do the V2 scans from 
> cache- (will be implemented by IMPALA-11826)
>  * During time-travel we also cache old file descriptors, so we need to 
> separate them from the actual snapshot's file descriptors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to