Re: [PR] [HUDI-9591] Add support of record iterator input for FG reader based merge handle [hudi]

via GitHub Wed, 23 Jul 2025 06:14:12 -0700


linliu-code commented on code in PR #13580:
URL: https://github.com/apache/hudi/pull/13580#discussion_r2225568316



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/FileGroupReaderBasedMergeHandle.java:
##########
@@ -284,9 +340,104 @@ public void onDelete(String recordKey, T previousRecord) {
 
     }
 
+    @Override
+    public String getName() {
+      return "CdcCallBack";
+    }
+
     private GenericRecord convertOutput(T record) {
       T convertedRecord = outputConverter.get().map(converter -> record == 
null ? null : converter.apply(record)).orElse(record);
       return convertedRecord == null ? null : 
readerContext.convertToAvroRecord(convertedRecord, requestedSchema.get());
     }
   }
+
+  private static class SecondaryIndexCallback<T> implements 
BaseFileUpdateCallback<T> {
+    private final String partitionPath;
+    private final Schema writeSchemaWithMetaFields;
+    private final HoodieReaderContext<T> readerContext;
+    private final Supplier<Schema> newSchemaSupplier;
+    private final WriteStatus writeStatus;
+    private final List<HoodieIndexDefinition> secondaryIndexDefns;
+    private final Option<BaseKeyGenerator> keyGeneratorOpt;
+    private final HoodieWriteConfig config;
+
+    public SecondaryIndexCallback(String partitionPath,
+                                  Schema writeSchemaWithMetaFields,
+                                  HoodieReaderContext<T> readerContext,
+                                  Supplier<Schema> newSchemaSupplier,
+                                  WriteStatus writeStatus,
+                                  List<HoodieIndexDefinition> 
secondaryIndexDefns,
+                                  Option<BaseKeyGenerator> keyGeneratorOpt,
+                                  HoodieWriteConfig config) {
+      this.partitionPath = partitionPath;
+      this.writeSchemaWithMetaFields = writeSchemaWithMetaFields;
+      this.readerContext = readerContext;
+      this.newSchemaSupplier = newSchemaSupplier;
+      this.secondaryIndexDefns = secondaryIndexDefns;
+      this.keyGeneratorOpt = keyGeneratorOpt;
+      this.writeStatus = writeStatus;
+      this.config = config;
+    }
+
+    @Override
+    public void onUpdate(String recordKey, T previousRecord, T mergedRecord) {
+      HoodieKey hoodieKey = new HoodieKey(recordKey, partitionPath);
+      BufferedRecord<T> bufferedPrevousRecord = 
BufferedRecord.forRecordWithContext(
+          previousRecord, writeSchemaWithMetaFields, readerContext, 
Option.empty(), false);
+      BufferedRecord<T> bufferedMergedRecord = 
BufferedRecord.forRecordWithContext(
+          mergedRecord, writeSchemaWithMetaFields, readerContext, 
Option.empty(), false);
+      SecondaryIndexStreamingTracker.trackSecondaryIndexStats(
+          hoodieKey,
+          Option.of(readerContext.constructHoodieRecord(bufferedMergedRecord)),

Review Comment:
   The secondaryIndex key can contains multiple columns. AFAIK, 
readercontext.getValue is only for single column. We can revisit here.
   
   >why not directly create HoodieRecord from engine specific representation 
here?
   I think the reason is that HoodieRecord provides the api to fetch column 
values, <T> does not.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-9591] Add support of record iterator input for FG reader based merge handle [hudi]

Reply via email to