danny0405 commented on code in PR #14045:
URL: https://github.com/apache/hudi/pull/14045#discussion_r2400712092
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/SecondaryIndexRecordGenerationUtils.java:
##########
@@ -114,21 +124,60 @@ public static <T> HoodieData<HoodieRecord>
convertWriteStatsToSecondaryIndexReco
String fileId = writeStatsByFileIdEntry.getKey();
List<HoodieWriteStat> writeStats = writeStatsByFileIdEntry.getValue();
String partition = writeStats.get(0).getPartitionPath();
- FileSlice previousFileSliceForFileId =
fsView.getLatestFileSlice(partition, fileId).orElse(null);
+ StoragePath basePath = dataMetaClient.getBasePath();
+
+ // validate that for a given fileId, either we have 1 parquet file or N
log files.
+ AtomicInteger totalParquetFiles = new AtomicInteger();
+ AtomicInteger totalLogFiles = new AtomicInteger();
+ writeStats.stream().forEach(writeStat -> {
+ if (FSUtils.isLogFile(new StoragePath(basePath, writeStat.getPath())))
{
+ totalLogFiles.getAndIncrement();
+ } else {
+ totalParquetFiles.getAndIncrement();
+ }
+ });
+
+ ValidationUtils.checkArgument(!(totalParquetFiles.get() > 0 &&
totalLogFiles.get() > 0), "Only either of base file or log files are expected
for a given file group. "
Review Comment:
wondering why these checks are necessary, isn't it the known constraint of
the write handles?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]