nsivabalan commented on code in PR #12321:
URL: https://github.com/apache/hudi/pull/12321#discussion_r1855668643
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -1141,28 +1140,22 @@ private void
updateSecondaryIndexIfPresent(HoodieCommitMetadata commitMetadata,
.forEach(partition -> {
HoodieData<HoodieRecord> secondaryIndexRecords;
try {
- secondaryIndexRecords = getSecondaryIndexUpdates(commitMetadata,
partition, writeStatus);
+ secondaryIndexRecords = getSecondaryIndexUpdates(commitMetadata,
partition, instantTime);
} catch (Exception e) {
throw new HoodieMetadataException("Failed to get secondary index
updates for partition " + partition, e);
}
partitionToRecordMap.put(partition, secondaryIndexRecords);
});
}
- private HoodieData<HoodieRecord>
getSecondaryIndexUpdates(HoodieCommitMetadata commitMetadata, String
indexPartition, HoodieData<WriteStatus> writeStatus) throws Exception {
+ private HoodieData<HoodieRecord>
getSecondaryIndexUpdates(HoodieCommitMetadata commitMetadata, String
indexPartition, String instantTime) throws Exception {
List<Pair<String, Pair<String, List<String>>>> partitionFilePairs =
getPartitionFilePairs(commitMetadata);
// Build a list of keys that need to be removed. A 'delete' record will be
emitted into the respective FileGroup of
// the secondary index partition for each of these keys. For a commit
which is deleting/updating a lot of records, this
// operation is going to be expensive (in CPU, memory and IO)
- List<String> keysToRemove = new ArrayList<>();
- writeStatus.collectAsList().forEach(status -> {
Review Comment:
yes, I did point this out already here
https://github.com/apache/hudi/compare/master...nsivabalan:hudi:mor_secIndex_Investigate-design2?expand=1
when I was AUDITing sec index design and impl.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]