Re: [PR] fix: Fixing secondary record generation for MDT [hudi]

via GitHub Thu, 02 Oct 2025 20:56:33 -0700


danny0405 commented on code in PR #14045:
URL: https://github.com/apache/hudi/pull/14045#discussion_r2400712092



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/SecondaryIndexRecordGenerationUtils.java:
##########
@@ -114,21 +124,60 @@ public static <T> HoodieData<HoodieRecord> 
convertWriteStatsToSecondaryIndexReco
       String fileId = writeStatsByFileIdEntry.getKey();
       List<HoodieWriteStat> writeStats = writeStatsByFileIdEntry.getValue();
       String partition = writeStats.get(0).getPartitionPath();
-      FileSlice previousFileSliceForFileId = 
fsView.getLatestFileSlice(partition, fileId).orElse(null);
+      StoragePath basePath = dataMetaClient.getBasePath();
+
+      // validate that for a given fileId, either we have 1 parquet file or N 
log files.
+      AtomicInteger totalParquetFiles = new AtomicInteger();
+      AtomicInteger totalLogFiles = new AtomicInteger();
+      writeStats.stream().forEach(writeStat -> {
+        if (FSUtils.isLogFile(new StoragePath(basePath, writeStat.getPath()))) 
{
+          totalLogFiles.getAndIncrement();
+        } else {
+          totalParquetFiles.getAndIncrement();
+        }
+      });
+
+      ValidationUtils.checkArgument(!(totalParquetFiles.get() > 0 && 
totalLogFiles.get() > 0), "Only either of base file or log files are expected 
for a given file group. "

Review Comment:
   wondering why these checks are necessary, isn't it the known constraint of 
the write handles?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] fix: Fixing secondary record generation for MDT [hudi]

Reply via email to