Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22871 )

Change subject: IMPALA-12162: Checksum files before lock in INSERT
......................................................................


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/22871/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/22871/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7348
PS7, Line 7348:     Map<String, FileMetadata> fileMetadata = getFileMetadata(
              :         feFsTable, update.getUpdated_partitions().values(), 
catalogTimeline);
From this point on, update.getUpdated_partitions() and fileMetadata is 
unchanged.
They are passed together as separate parameter in createInsertEvents(), 
prepareInsertEventData(), down to makeInsertEventData().
It might be possible to combine both into single data structure to avoid 
HashTable lookup later. Something like:

  updatedParts = { partname:getFileMetadata(feFsTable, partval, catalogTimeline)
                   for (partname, partval) in 
update.getUpdated_partitions().items() }

where getFileMetadata just return List<FileMetadata> and FileMetadata contains 
fileName in it.


http://gerrit.cloudera.org:8080/#/c/22871/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7857
PS7, Line 7857: String checksum;
Now that we have FileMetadata as container class, can we keep checksum as 
byte[] and length rather than String?
Then, add getter method that will return String translation of byte[].
I thought it might save some memory.

fs.getFileChecksum() is the time consuming method, isn't it?



--
To view, visit http://gerrit.cloudera.org:8080/22871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I18f9686f5d53cf1e7c384684c25427fb5353e2af
Gerrit-Change-Number: 22871
Gerrit-PatchSet: 7
Gerrit-Owner: Michael Smith <michael.sm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <saihema...@cloudera.com>
Gerrit-Comment-Date: Tue, 20 May 2025 22:34:54 +0000
Gerrit-HasComments: Yes

Reply via email to