Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22501 )

Change subject: WIP: IMPALA-13268: Integrate Iceberg ScanMetrics into Impala 
query profiles
......................................................................


Patch Set 3:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java:

http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@178
PS3, Line 178:     if (!needIcebergForPlanning()) {
             :       analyzer_.materializeSlots(conjuncts_);
             :       setFileDescriptorsBasedOnFileStore();
             :
             :       PlanNode res = createIcebergScanPlanImpl();
             :
             :       // Update the profile that there is a full scan without 
planning with Iceberg.
             :       
org.apache.impala.service.Frontend.addIcebergScanMetricsToProfile(
             :           ctx_.getQueryCtx().query_id, res.getId().toString(), 
null);
             :
             :       return res;
             :     }
             :
             :     ScanMetricsResult metricsResult = filterFileDescriptors();
             :     filterConjuncts();
             :     analyzer_.materializeSlots(conjuncts_);
             :     PlanNode res = createIcebergScanPlanImpl();
             :
             :     // Update the query profile with the scan metrics.
             :     if (metricsResult != null) {
             :       
org.apache.impala.service.Frontend.addIcebergScanMetricsToProfile(
             :           ctx_.getQueryCtx().query_id, res.getId().toString(), 
metricsResult);
             :     } else {
             :       Preconditions.checkState(snapshotId_ == -1);
             :     }
             :
             :     return res;
             :   }
How about:

    PlanNode res;
    if (!needIcebergForPlanning()) {
      analyzer_.materializeSlots(conjuncts_);
      setFileDescriptorsBasedOnFileStore();
      res = createIcebergScanPlanImpl();
    } else {
      ScanMetricsResult metricsResult = filterFileDescriptors();
      filterConjuncts();
      analyzer_.materializeSlots(conjuncts_);
      res = createIcebergScanPlanImpl();
    }
    Preconditions.checkState(res != null);
    Preconditions.checkState(
       !needIcebergForPlanning() || metricsResult != null || snapshotId_ == -1);
    // Update the query profile with the scan metrics.
    org.apache.impala.service.Frontend.addIcebergScanMetricsToProfile(
        ctx_.getQueryCtx().query_id, res.getId().toString(), metricsResult);
    return res;
  }

The if statement could be extracted to a method that just creates the PlanNode.


http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/service/Frontend.java@2592
PS3, Line 2592: fillProfileNodeWithIcebergScanMetrics
I wonder how we can keep the list of metrics up-to-date. Maybe adding a unit 
test that uses reflection and iterates over the members(/public static final 
Strings) of ScanMetrics and count them?


http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/service/Frontend.java@2594
PS3, Line 2594:     long durationMs = 
metrics.totalPlanningDuration().totalDuration().toMillis();
> I recall ScanMetricsResult has a single (generated) implementation Immutabl
This would also keep the metrics up-to-date. If the format is not good for us, 
probably it is still something easily parseable.


http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/service/Frontend.java@2614
PS3, Line 2614:     String totalFileSize = 
PrintUtils.printBytes(totalFileSizeInBytes);
When we pretty-print bytes we usually also print the value in bytes in 
parentheses after the pretty-printed value, e.g.:

 - PerHostPeakMemUsage: 157.41 MB (165056046)

This makes it easy to parse the values from scripts.


http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/service/FrontendProfile.java
File fe/src/main/java/org/apache/impala/service/FrontendProfile.java:

http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/service/FrontendProfile.java@79
PS3, Line 79: provisional
nit: how about using the word "staged"? This feels like a more common term used 
for such things. Then addProvisionalChildrenProfile() could be 
stageChildrenProfile().


http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/util/IcebergUtil.java
File fe/src/main/java/org/apache/impala/util/IcebergUtil.java:

http://gerrit.cloudera.org:8080/#/c/22501/3/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@640
PS3, Line 640: MetricsReporters.combine(LoggingMetricsReporter.instance(),
Is it needed? The comment of Scan.metricsReporter() says:
"Create a new scan that will report scan metrics to the provided reporter in 
addition to reporters maintained by the scan."


http://gerrit.cloudera.org:8080/#/c/22501/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-scan-metrics.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-scan-metrics.test:

http://gerrit.cloudera.org:8080/#/c/22501/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-scan-metrics.test@3
PS3, Line 3: functional_parquet.iceberg_partitioned
Please add tests with multiple Iceberg tables, and also use Iceberg tables:
 - iceberg_v2_no_deletes
 - iceberg_v2_positional_delete_all_rows
 - iceberg_v2_positional_not_all_data_files_have_delete_files



--
To view, visit http://gerrit.cloudera.org:8080/22501
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I080ee8eafc459dad4d21356ac9042b72d0570219
Gerrit-Change-Number: 22501
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <npaptak...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Wed, 26 Feb 2025 12:00:10 +0000
Gerrit-HasComments: Yes

Reply via email to