Re: [PR] [HUDI-9340] Add MDT streaming write support for secondary index [hudi]

via GitHub Wed, 18 Jun 2025 21:40:23 -0700


danny0405 commented on code in PR #13449:
URL: https://github.com/apache/hudi/pull/13449#discussion_r2156094846



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieCreateHandle.java:
##########
@@ -173,6 +186,23 @@ record = record.prependMetaFields(schema, 
writeSchemaWithMetaFields, new Metadat
     }
   }
 
+  private void trackMetadataIndexStats(HoodieRecord record) {
+    if (isSecondaryIndexStreamingDisabled()) {
+      return;
+    }
+
+    // Add secondary index records for all the inserted records
+    secondaryIndexDefns.forEach(secondaryIndexPartitionPathFieldPair -> {
+      String secondaryIndexSourceField = 
String.join(".",secondaryIndexPartitionPathFieldPair.getValue().getSourceFields());
+      if (record instanceof HoodieAvroIndexedRecord) {

Review Comment:
   > So, even if we spend time fixing this to be generic, in reality only AVRO 
is going to take effect.
   
   The fact is many users already enabled the `Spark` record type in production 
for it's efficiency even though Spark's default record type is `AVRO`.
   
   >  just that making any changes to write handle constructor will touch 30+ 
files and we do not want to increase the scope of this patch for those changes.
   
   If you are referring to the `engineContext`, it's already there within the 
`hoodieTable`. Actually I kind of think the 
`HoodieEngineContext.getReaderContextFactory` should be moved to `HoodieTable` 
because the table meta client usually bonds to a hoodie table and the engine 
context is also there.
   
   > and as I mentioned, looks like SPARK record type in SPARK engine has gaps 
to be fixed on the writer side
   
   Not sure how much it affects this patch, but it should be a small fix I 
guess, let's fix it in this PR if there is.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-9340] Add MDT streaming write support for secondary index [hudi]

Reply via email to