Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23906 )

Change subject: IMPALA-13122: Add detailed file metadata statistics to table 
loading logs
......................................................................


Patch Set 7:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/23906/7/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
File fe/src/main/java/org/apache/impala/catalog/FeFsTable.java:

http://gerrit.cloudera.org:8080/#/c/23906/7/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java@252
PS7, Line 252: java.time.ZoneId
nit: import java.time.ZoneId


http://gerrit.cloudera.org:8080/#/c/23906/7/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java@256
PS7, Line 256: .append(" partitions: ").append(partNames)
Let's skip this for non-partitioned tables. It's confusing to see logs like 
"partitions: ."


http://gerrit.cloudera.org:8080/#/c/23906/7/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java@257
PS7, Line 257: org.apache.impala.common.PrintUtils
nit: this class is already imported on L49 so we can use the class name directly


http://gerrit.cloudera.org:8080/#/c/23906/7/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java@275
PS7, Line 275: java.time.Instant
nit: import java.time.Instant to use Instant directly.


http://gerrit.cloudera.org:8080/#/c/23906/7/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
File fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java:

http://gerrit.cloudera.org:8080/#/c/23906/7/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java@242
PS7, Line 242: minAccessTime_
I think we can declare minAccessTime_ and maxAccessTime_ as local variables 
since they are both updated in loadInternal(). We can update fileMetadataStats_ 
in loadInternal() instead of getFileMetadataStats().

      long minAccessTime = Long.MAX_VALUE;
      long maxAccessTime = 0;
      for (FileStatus fileStatus : fileStatuses) {
        ...
        if (accessTime > 0) {  // Access time can be 0 if not supported/disabled
          minAccessTime = Math.min(minAccessTime, accessTime);
          maxAccessTime = Math.max(maxAccessTime, accessTime);
        }
      }
      if (maxAccessTime > 0) {
        fileMetadataStats_.minAccessTime = minAccessTime;
        fileMetadataStats_.maxAccessTime = maxAccessTime;
      }


http://gerrit.cloudera.org:8080/#/c/23906/7/fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
File fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java:

http://gerrit.cloudera.org:8080/#/c/23906/7/fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java@134
PS7, Line 134: || stats.getNumUniqueHostDiskPairs() == 0
Why do we need this OR statement? Is there are case that number of hosts > 0 
but number of host:disk pairs is 0?


http://gerrit.cloudera.org:8080/#/c/23906/7/tests/custom_cluster/test_file_metadata_stats.py
File tests/custom_cluster/test_file_metadata_stats.py:

http://gerrit.cloudera.org:8080/#/c/23906/7/tests/custom_cluster/test_file_metadata_stats.py@29
PS7, Line 29: catalog
nit: "catalogd", we usually use the term "catalog" to mean all the metadata 
loading/caching stuffs in all services. "catalogd" means the catalog server 
itself.



--
To view, visit http://gerrit.cloudera.org:8080/23906
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6f4592f173c047e5064058402f83be6d1f5c9a79
Gerrit-Change-Number: 23906
Gerrit-PatchSet: 7
Gerrit-Owner: Arnab Karmakar <[email protected]>
Gerrit-Reviewer: Arnab Karmakar <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Jason Fehr <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Pranav Lodha <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Fri, 27 Feb 2026 12:57:48 +0000
Gerrit-HasComments: Yes

Reply via email to