Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22501 )

Change subject: WIP: IMPALA-13268: Integrate Iceberg ScanMetrics into Impala 
query profiles
......................................................................


Patch Set 3:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java:

http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@191
PS3, Line 191:     ScanMetricsResult metricsResult = filterFileDescriptors();
I don't think it's that intuitive that a function called 
filterFileDescriptors() returns an Iceberg related scan metric result.
We already have a member introduced for the metricsreporter. Can't you use it 
here to get the metrics instead of returning from the function?


http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/service/Frontend.java@2577
PS3, Line 2577: TUniqueId queryId
is this param used?


http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/service/Frontend.java@2594
PS3, Line 2594:     long durationMs = 
metrics.totalPlanningDuration().totalDuration().toMillis();
I recall ScanMetricsResult has a single (generated) implementation 
ImmutableScanMetricsResult and it has a toString() method that could reduce the 
ammount of code needed here. Considering the format returned from that function 
is sufficient here.


http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/service/Frontend.java@2615
PS3, Line 2615: "total-file-size"
there is ScanMetrics.TOTAL_FILE_SIZE_IN_BYTES available for this. Same for the 
below too.


http://gerrit.cloudera.org:8080/#/c/22501/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-scan-metrics.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-scan-metrics.test:

http://gerrit.cloudera.org:8080/#/c/22501/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-scan-metrics.test@5
PS3, Line 5: Iceberg Plan Scan Metrics for Node 00:
I always found the naming 'ScanMetrics' in Iceberg misleading. It is in fact 
metrics for planning and not for scanning. I think here it introduces the same 
ambiguity with the naming.

Maybe 'Iceberg Plan metrics for Scan Node 00'?


http://gerrit.cloudera.org:8080/#/c/22501/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-scan-metrics.test@9
PS3, Line 9: select * from functional_parquet.iceberg_partitioned where 
action='download'
Would be nice to have a test query that involves scan planning from multiple 
tables, resulting multiple sections for the metrics in the profile.


http://gerrit.cloudera.org:8080/#/c/22501/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-scan-metrics.test@12
PS3, Line 12: row_regex:.*total-planning-duration: .+
Since the table content is fix in the test suite, could you assert on the exact 
values too for these metrics. Except for the duration, of course.



-- 
To view, visit http://gerrit.cloudera.org:8080/22501
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I080ee8eafc459dad4d21356ac9042b72d0570219
Gerrit-Change-Number: 22501
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <npaptak...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Mon, 24 Feb 2025 13:17:05 +0000
Gerrit-HasComments: Yes

Reply via email to