Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/22501 )
Change subject: WIP: IMPALA-13268: Integrate Iceberg ScanMetrics into Impala query profiles ...................................................................... Patch Set 3: (7 comments) http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java: http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@178 PS3, Line 178: if (!needIcebergForPlanning()) { : analyzer_.materializeSlots(conjuncts_); : setFileDescriptorsBasedOnFileStore(); : : PlanNode res = createIcebergScanPlanImpl(); : : // Update the profile that there is a full scan without planning with Iceberg. : org.apache.impala.service.Frontend.addIcebergScanMetricsToProfile( : ctx_.getQueryCtx().query_id, res.getId().toString(), null); : : return res; : } : : ScanMetricsResult metricsResult = filterFileDescriptors(); : filterConjuncts(); : analyzer_.materializeSlots(conjuncts_); : PlanNode res = createIcebergScanPlanImpl(); : : // Update the query profile with the scan metrics. : if (metricsResult != null) { : org.apache.impala.service.Frontend.addIcebergScanMetricsToProfile( : ctx_.getQueryCtx().query_id, res.getId().toString(), metricsResult); : } else { : Preconditions.checkState(snapshotId_ == -1); : } : : return res; : } How about: PlanNode res; if (!needIcebergForPlanning()) { analyzer_.materializeSlots(conjuncts_); setFileDescriptorsBasedOnFileStore(); res = createIcebergScanPlanImpl(); } else { ScanMetricsResult metricsResult = filterFileDescriptors(); filterConjuncts(); analyzer_.materializeSlots(conjuncts_); res = createIcebergScanPlanImpl(); } Preconditions.checkState(res != null); Preconditions.checkState( !needIcebergForPlanning() || metricsResult != null || snapshotId_ == -1); // Update the query profile with the scan metrics. org.apache.impala.service.Frontend.addIcebergScanMetricsToProfile( ctx_.getQueryCtx().query_id, res.getId().toString(), metricsResult); return res; } The if statement could be extracted to a method that just creates the PlanNode. http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/service/Frontend.java@2592 PS3, Line 2592: fillProfileNodeWithIcebergScanMetrics I wonder how we can keep the list of metrics up-to-date. Maybe adding a unit test that uses reflection and iterates over the members(/public static final Strings) of ScanMetrics and count them? http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/service/Frontend.java@2594 PS3, Line 2594: long durationMs = metrics.totalPlanningDuration().totalDuration().toMillis(); > I recall ScanMetricsResult has a single (generated) implementation Immutabl This would also keep the metrics up-to-date. If the format is not good for us, probably it is still something easily parseable. http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/service/Frontend.java@2614 PS3, Line 2614: String totalFileSize = PrintUtils.printBytes(totalFileSizeInBytes); When we pretty-print bytes we usually also print the value in bytes in parentheses after the pretty-printed value, e.g.: - PerHostPeakMemUsage: 157.41 MB (165056046) This makes it easy to parse the values from scripts. http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/service/FrontendProfile.java File fe/src/main/java/org/apache/impala/service/FrontendProfile.java: http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/service/FrontendProfile.java@79 PS3, Line 79: provisional nit: how about using the word "staged"? This feels like a more common term used for such things. Then addProvisionalChildrenProfile() could be stageChildrenProfile(). http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/util/IcebergUtil.java File fe/src/main/java/org/apache/impala/util/IcebergUtil.java: http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@640 PS3, Line 640: MetricsReporters.combine(LoggingMetricsReporter.instance(), Is it needed? The comment of Scan.metricsReporter() says: "Create a new scan that will report scan metrics to the provided reporter in addition to reporters maintained by the scan." http://gerrit.cloudera.org:8080/#/c/22501/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-scan-metrics.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-scan-metrics.test: http://gerrit.cloudera.org:8080/#/c/22501/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-scan-metrics.test@3 PS3, Line 3: functional_parquet.iceberg_partitioned Please add tests with multiple Iceberg tables, and also use Iceberg tables: - iceberg_v2_no_deletes - iceberg_v2_positional_delete_all_rows - iceberg_v2_positional_not_all_data_files_have_delete_files -- To view, visit http://gerrit.cloudera.org:8080/22501 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I080ee8eafc459dad4d21356ac9042b72d0570219 Gerrit-Change-Number: 22501 Gerrit-PatchSet: 3 Gerrit-Owner: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Noemi Pap-Takacs <npaptak...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Wed, 26 Feb 2025 12:00:10 +0000 Gerrit-HasComments: Yes