Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/23906 )
Change subject: IMPALA-13122: Add detailed file metadata statistics to table loading logs ...................................................................... Patch Set 7: (7 comments) http://gerrit.cloudera.org:8080/#/c/23906/7/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java File fe/src/main/java/org/apache/impala/catalog/FeFsTable.java: http://gerrit.cloudera.org:8080/#/c/23906/7/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java@252 PS7, Line 252: java.time.ZoneId nit: import java.time.ZoneId http://gerrit.cloudera.org:8080/#/c/23906/7/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java@256 PS7, Line 256: .append(" partitions: ").append(partNames) Let's skip this for non-partitioned tables. It's confusing to see logs like "partitions: ." http://gerrit.cloudera.org:8080/#/c/23906/7/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java@257 PS7, Line 257: org.apache.impala.common.PrintUtils nit: this class is already imported on L49 so we can use the class name directly http://gerrit.cloudera.org:8080/#/c/23906/7/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java@275 PS7, Line 275: java.time.Instant nit: import java.time.Instant to use Instant directly. http://gerrit.cloudera.org:8080/#/c/23906/7/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java File fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java: http://gerrit.cloudera.org:8080/#/c/23906/7/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java@242 PS7, Line 242: minAccessTime_ I think we can declare minAccessTime_ and maxAccessTime_ as local variables since they are both updated in loadInternal(). We can update fileMetadataStats_ in loadInternal() instead of getFileMetadataStats(). long minAccessTime = Long.MAX_VALUE; long maxAccessTime = 0; for (FileStatus fileStatus : fileStatuses) { ... if (accessTime > 0) { // Access time can be 0 if not supported/disabled minAccessTime = Math.min(minAccessTime, accessTime); maxAccessTime = Math.max(maxAccessTime, accessTime); } } if (maxAccessTime > 0) { fileMetadataStats_.minAccessTime = minAccessTime; fileMetadataStats_.maxAccessTime = maxAccessTime; } http://gerrit.cloudera.org:8080/#/c/23906/7/fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java File fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java: http://gerrit.cloudera.org:8080/#/c/23906/7/fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java@134 PS7, Line 134: || stats.getNumUniqueHostDiskPairs() == 0 Why do we need this OR statement? Is there are case that number of hosts > 0 but number of host:disk pairs is 0? http://gerrit.cloudera.org:8080/#/c/23906/7/tests/custom_cluster/test_file_metadata_stats.py File tests/custom_cluster/test_file_metadata_stats.py: http://gerrit.cloudera.org:8080/#/c/23906/7/tests/custom_cluster/test_file_metadata_stats.py@29 PS7, Line 29: catalog nit: "catalogd", we usually use the term "catalog" to mean all the metadata loading/caching stuffs in all services. "catalogd" means the catalog server itself. -- To view, visit http://gerrit.cloudera.org:8080/23906 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6f4592f173c047e5064058402f83be6d1f5c9a79 Gerrit-Change-Number: 23906 Gerrit-PatchSet: 7 Gerrit-Owner: Arnab Karmakar <[email protected]> Gerrit-Reviewer: Arnab Karmakar <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Jason Fehr <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Pranav Lodha <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Fri, 27 Feb 2026 12:57:48 +0000 Gerrit-HasComments: Yes
