Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/22871 )
Change subject: IMPALA-12162: Checksum files before lock in INSERT ...................................................................... Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/22871/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/22871/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7348 PS7, Line 7348: Map<String, FileMetadata> fileMetadata = getFileMetadata( : feFsTable, update.getUpdated_partitions().values(), catalogTimeline); From this point on, update.getUpdated_partitions() and fileMetadata is unchanged. They are passed together as separate parameter in createInsertEvents(), prepareInsertEventData(), down to makeInsertEventData(). It might be possible to combine both into single data structure to avoid HashTable lookup later. Something like: updatedParts = { partname:getFileMetadata(feFsTable, partval, catalogTimeline) for (partname, partval) in update.getUpdated_partitions().items() } where getFileMetadata just return List<FileMetadata> and FileMetadata contains fileName in it. http://gerrit.cloudera.org:8080/#/c/22871/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7857 PS7, Line 7857: String checksum; Now that we have FileMetadata as container class, can we keep checksum as byte[] and length rather than String? Then, add getter method that will return String translation of byte[]. I thought it might save some memory. fs.getFileChecksum() is the time consuming method, isn't it? -- To view, visit http://gerrit.cloudera.org:8080/22871 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I18f9686f5d53cf1e7c384684c25427fb5353e2af Gerrit-Change-Number: 22871 Gerrit-PatchSet: 7 Gerrit-Owner: Michael Smith <michael.sm...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Sai Hemanth Gantasala <saihema...@cloudera.com> Gerrit-Comment-Date: Tue, 20 May 2025 22:34:54 +0000 Gerrit-HasComments: Yes