lokeshj1703 commented on code in PR #13449:
URL: https://github.com/apache/hudi/pull/13449#discussion_r2154663478


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java:
##########
@@ -546,12 +559,75 @@ public List<WriteStatus> close() {
         status.getStat().setFileSizeInBytes(logFileSize);
       }
 
+      // generate Secondary index stats if streaming is enabled.
+      if (!isSecondaryIndexStreamingDisabled()) {
+        // Adds secondary index only for the last log file write status. We do 
not need to add secondary index stats
+        // for every log file written as part of the append handle write. The 
last write status would update the
+        // secondary index considering all the log files.
+        
trackMetadataIndexStatsForStreamingMetadataWrites(fileSliceOpt.or(this::getFileSlice),
 statuses.stream().map(status -> 
status.getStat().getPath()).collect(Collectors.toList()),
+            statuses.get(statuses.size() - 1));
+      }
+
       return statuses;
     } catch (IOException e) {
       throw new HoodieUpsertException("Failed to close UpdateHandle", e);
     }
   }
 
+  private void 
trackMetadataIndexStatsForStreamingMetadataWrites(Option<FileSlice> 
fileSliceOpt, List<String> newLogFiles, WriteStatus status) {
+    // TODO: Optimise the computation for multiple secondary indexes

Review Comment:
   Addressed



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/CreateHandleFactory.java:
##########
@@ -27,13 +27,19 @@
 public class CreateHandleFactory<T, I, K, O> extends WriteHandleFactory<T, I, 
K, O> implements Serializable {
 
   private boolean preserveMetadata = false;
+  private final boolean isSecondaryIndexStreamingDisabled;

Review Comment:
   I tried this and it ballooned to a huge change. Discussed with @danny0405, 
he suggested an approach to use preserveMetadata. preserveMetadata is true for 
compaction and clustering.



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java:
##########
@@ -452,6 +461,58 @@ protected HoodieRecord<T> updateFileName(HoodieRecord<T> 
record, Schema schema,
     return record.prependMetaFields(schema, targetSchema, metadataValues, 
prop);
   }
 
+  private void trackMetadataIndexStats(Option<HoodieKey> hoodieKeyOpt, 
Option<HoodieRecord> combinedRecordOpt, Option<HoodieRecord<T>> oldRecordOpt, 
boolean isDelete) {
+    if (isSecondaryIndexStreamingDisabled()) {
+      return;
+    }
+    HoodieEngineContext engineContext = new 
HoodieLocalEngineContext(hoodieTable.getStorageConf(), taskContextSupplier);
+    HoodieReaderContext readerContext = 
engineContext.getReaderContextFactory(hoodieTable.getMetaClient()).getContext();
+
+    secondaryIndexDefns.forEach(secondaryIndexPartitionPathFieldPair -> {
+      String secondaryIndexSourceField = 
String.join(".",secondaryIndexPartitionPathFieldPair.getValue().getSourceFields());

Review Comment:
   Addressed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to