nsivabalan commented on code in PR #14048:
URL: https://github.com/apache/hudi/pull/14048#discussion_r2400603555


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataWriteUtils.java:
##########
@@ -542,4 +512,51 @@ private static StoragePathInfo 
getLogFileStoragePathInfo(HoodieLogFile logFile)
     }
     return new StoragePathInfo(logFile.getPath(), logFile.getFileSize(), 
false, (short) 0, 0, 0);
   }
+
+  public static String getMaxInstantTime(HoodieTableMetaClient dataMetaClient, 
String instantTime) {
+    Option<String> lastCompletedInstant = 
dataMetaClient.getActiveTimeline().filterCompletedInstants()
+        .lastInstant()
+        .map(HoodieInstant::requestedTime);
+    return lastCompletedInstant.map(lastCompletedInstantTime ->
+        lastCompletedInstantTime.compareTo(instantTime) > 0 ? 
lastCompletedInstantTime : instantTime).orElse(instantTime);
+  }
+
+  /**
+   * Collect column metadata of each file that does not have column stats 
provided by the write stat in the commit metadata
+   */
+  public static Set<String> getFilesToFetchColumnStats(List<HoodieWriteStat> 
partitionedWriteStat,
+                                                       HoodieTableMetaClient 
dataMetaClient,
+                                                       HoodieTableMetadata 
tableMetadata,
+                                                       HoodieWriteConfig 
dataWriteConfig,
+                                                       String partitionName,
+                                                       String maxInstantTime,
+                                                       String instantTime) {
+    // Get the latest merged file slices based on the commited files part of 
the latest snapshot and the new files of the current commit metadata
+    // Get the latest merged file slices based on the commited files part of 
the latest snapshot and the new files of the current commit metadata
+    List<StoragePathInfo> consolidatedPathInfos = new ArrayList<>();
+    partitionedWriteStat.forEach(
+        stat -> consolidatedPathInfos.add(
+            new StoragePathInfo(new StoragePath(dataMetaClient.getBasePath(), 
stat.getPath()), stat.getFileSizeInBytes(), false, (short) 0, 0, 0)));
+    SyncableFileSystemView fileSystemViewForCommitedFiles = 
FileSystemViewManager.createViewManager(new 
HoodieLocalEngineContext(dataMetaClient.getStorageConf()),
+        dataWriteConfig.getMetadataConfig(), 
dataWriteConfig.getViewStorageConfig(), dataWriteConfig.getCommonConfig(),

Review Comment:
   guess the clustering fix that ethan is yet to add to partition stats index, 
also need to be applied here. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to