This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git
commit eb79fbea2b452f09e0e04edc4be274942423d498 Author: Daniel Becker <[email protected]> AuthorDate: Tue May 6 17:09:22 2025 +0200 IMPALA-14033: Document the integration of Iceberg ScanMetrics in the query profile This change documents the integration of Iceberg ScanMetrics into Impala query profiles. Change-Id: I49d27ecd0f37ffed58afb8abea04bf592d68f11c Reviewed-on: http://gerrit.cloudera.org:8080/22859 Tested-by: Impala Public Jenkins <[email protected]> Reviewed-by: Zoltan Borok-Nagy <[email protected]> --- docs/topics/impala_iceberg.xml | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) diff --git a/docs/topics/impala_iceberg.xml b/docs/topics/impala_iceberg.xml index c22f32259..e49133ddb 100644 --- a/docs/topics/impala_iceberg.xml +++ b/docs/topics/impala_iceberg.xml @@ -210,6 +210,36 @@ CREATE TABLE ice_ctas_part_spec PARTITIONED BY SPEC (truncate(3, s)) STORED AS I </conbody> </concept> + <concept id="iceberg_scan_metrics"> + <title>Iceberg Scan Metrics</title> + <conbody> + <p> + When Impala runs queries on Iceberg tables, sometimes it uses Iceberg's + 'planFiles()' API during planning. As it is an expensive call, Impala avoids it + when possible, but it is necessary in the following cases: + - if one or more predicates are pushed down to Iceberg + - if there is time travel. + + The call to 'planFiles()', on the other hand, also collects metrics, e.g. the + total Iceberg planning time, the number of data/delete files and manifests and how + many of these can be skipped. + + These metrics are integrated into the query profile under the "Frontend" section. + As they are per-table, if multiple tables are scanned for the query, there will be + multiple sections in the profile. + + Note that for Iceberg tables where Iceberg's 'planFiles()' API was not used in + planning, the metrics are not available and the profile will contain a short note + describing this. + + To facilitate pairing the metrics with scans, the metrics header + references the plan node responsible for the scan. This will always be + the top level node for the scan, so it can be a SCAN node, a JOIN node + or a UNION node depending on whether the table has delete files. + </p> + </conbody> + </concept> + <concept id="iceberg_v2"> <title>Iceberg V2 tables</title> <conbody>
