Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22501 )

Change subject: IMPALA-13268: Integrate Iceberg ScanMetrics into Impala query 
profiles
......................................................................


Patch Set 5:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java:

http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@191
PS3, Line 191:     return res;
> I don't think it's that intuitive that a function called filterFileDescript
Done


http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@178
PS3, Line 178:
             :   public PlanNode createIcebergScanPlan() throws ImpalaException 
{
             :     PlanNode res = createPlanNodeHelper();
             :     Preconditions.checkNotNull(res);
             :
             :     ScanMetricsResult metricsResult = 
metricsReporter_.scanMetricsResult();
             :     Preconditions.checkState(
             :         !needIcebergForPlanning() || snapshotId_ == -1 || 
metricsResult != null);
             :
             :     // Update the query profile with the scan metrics.
             :     
org.apache.impala.service.Frontend.addIcebergScanMetricsToProfile(
             :         res.getId().toString(), metricsResult);
             :
             :     return res;
             :   }
             :
             :   private PlanNode createPlanNodeHelper() throws ImpalaException 
{
             :     if (!needIcebergForPlanning()) {
             :       analyzer_.materializeSlots(conjuncts_);
             :       setFileDescriptorsBasedOnFileStore();
             :       return createIcebergScanPlanImpl();
             :     } else {
             :       filterFileDescriptors();
             :       filterConjuncts();
             :       analyzer_.materializeSlots(conjuncts_);
             :       return createIcebergScanPlanImpl();
             :     }
             :   }
> How about:
Done


http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/service/Frontend.java@2577
PS3, Line 2577: String nodeId,
> is this param used?
Done


http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/service/Frontend.java@2592
PS3, Line 2592: fillProfileNodeWithIcebergScanMetrics
> I wonder how we can keep the list of metrics up-to-date. Maybe adding a uni
Done


http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/service/Frontend.java@2594
PS3, Line 2594:     long durationMs = 
metrics.totalPlanningDuration().totalDuration().toMillis();
> This would also keep the metrics up-to-date. If the format is not good for
I decided on the unit test approach: it is less cumbersome to use the interface 
without reflection in the normal code, and this interface is not expected to 
change often, so it's not too problematic if we have to update the code.


http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/service/Frontend.java@2614
PS3, Line 2614:     addInfoString(profile, ScanMetrics.SKIPPED_DATA_MANIFESTS, 
skippedD
> When we pretty-print bytes we usually also print the value in bytes in pare
Done


http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/service/Frontend.java@2615
PS3, Line 2615:
> there is ScanMetrics.TOTAL_FILE_SIZE_IN_BYTES available for this. Same for
I didn't want to include "bytes" in the string because we're pretty-printing 
the result. But now we also provide the raw value, as Zoltan suggested, so I 
reverted back to the standard name.


http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/service/FrontendProfile.java
File fe/src/main/java/org/apache/impala/service/FrontendProfile.java:

http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/service/FrontendProfile.java@79
PS3, Line 79: stagedChild
> nit: how about using the word "staged"? This feels like a more common term
Done


http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/util/IcebergUtil.java
File fe/src/main/java/org/apache/impala/util/IcebergUtil.java:

http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@640
PS3, Line 640:
> Is it needed? The comment of Scan.metricsReporter() says:
Done


http://gerrit.cloudera.org:8080/#/c/22501/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-scan-metrics.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-scan-metrics.test:

http://gerrit.cloudera.org:8080/#/c/22501/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-scan-metrics.test@3
PS3, Line 3: functional_parquet.iceberg_partitioned
> Please add tests with multiple Iceberg tables, and also use Iceberg tables:
Done


http://gerrit.cloudera.org:8080/#/c/22501/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-scan-metrics.test@5
PS3, Line 5: Iceberg Plan Metrics for Node 00:
> I always found the naming 'ScanMetrics' in Iceberg misleading. It is in fac
I changed it to Iceberg Plan Metrics, but did not add "Scan Node" because the 
node may be a union if there are delete files.


http://gerrit.cloudera.org:8080/#/c/22501/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-scan-metrics.test@9
PS3, Line 9: # Filtering on a partition column pushes the predicate down to 
Iceberg, so we have metrics.
> Would be nice to have a test query that involves scan planning from multipl
Done


http://gerrit.cloudera.org:8080/#/c/22501/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-scan-metrics.test@12
PS3, Line 12: Iceberg Plan Metrics for Node 00:
> Since the table content is fix in the test suite, could you assert on the e
Done, but for the byte sizes I also kept the regexes because I remember the 
data load on some system writes files with a bit different sizes.



--
To view, visit http://gerrit.cloudera.org:8080/22501
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I080ee8eafc459dad4d21356ac9042b72d0570219
Gerrit-Change-Number: 22501
Gerrit-PatchSet: 5
Gerrit-Owner: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <npaptak...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Thu, 27 Feb 2025 12:23:45 +0000
Gerrit-HasComments: Yes

Reply via email to