Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/22501 )
Change subject: WIP: IMPALA-13268: Integrate Iceberg ScanMetrics into Impala query profiles ...................................................................... Patch Set 3: (7 comments) http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java: http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@191 PS3, Line 191: ScanMetricsResult metricsResult = filterFileDescriptors(); I don't think it's that intuitive that a function called filterFileDescriptors() returns an Iceberg related scan metric result. We already have a member introduced for the metricsreporter. Can't you use it here to get the metrics instead of returning from the function? http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/service/Frontend.java@2577 PS3, Line 2577: TUniqueId queryId is this param used? http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/service/Frontend.java@2594 PS3, Line 2594: long durationMs = metrics.totalPlanningDuration().totalDuration().toMillis(); I recall ScanMetricsResult has a single (generated) implementation ImmutableScanMetricsResult and it has a toString() method that could reduce the ammount of code needed here. Considering the format returned from that function is sufficient here. http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/service/Frontend.java@2615 PS3, Line 2615: "total-file-size" there is ScanMetrics.TOTAL_FILE_SIZE_IN_BYTES available for this. Same for the below too. http://gerrit.cloudera.org:8080/#/c/22501/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-scan-metrics.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-scan-metrics.test: http://gerrit.cloudera.org:8080/#/c/22501/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-scan-metrics.test@5 PS3, Line 5: Iceberg Plan Scan Metrics for Node 00: I always found the naming 'ScanMetrics' in Iceberg misleading. It is in fact metrics for planning and not for scanning. I think here it introduces the same ambiguity with the naming. Maybe 'Iceberg Plan metrics for Scan Node 00'? http://gerrit.cloudera.org:8080/#/c/22501/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-scan-metrics.test@9 PS3, Line 9: select * from functional_parquet.iceberg_partitioned where action='download' Would be nice to have a test query that involves scan planning from multiple tables, resulting multiple sections for the metrics in the profile. http://gerrit.cloudera.org:8080/#/c/22501/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-scan-metrics.test@12 PS3, Line 12: row_regex:.*total-planning-duration: .+ Since the table content is fix in the test suite, could you assert on the exact values too for these metrics. Except for the duration, of course. -- To view, visit http://gerrit.cloudera.org:8080/22501 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I080ee8eafc459dad4d21356ac9042b72d0570219 Gerrit-Change-Number: 22501 Gerrit-PatchSet: 3 Gerrit-Owner: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Noemi Pap-Takacs <npaptak...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Mon, 24 Feb 2025 13:17:05 +0000 Gerrit-HasComments: Yes