vinothchandar commented on code in PR #7841:
URL: https://github.com/apache/hudi/pull/7841#discussion_r1096047641


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java:
##########
@@ -280,6 +280,7 @@ protected void commit(HoodieTable table, String 
commitActionType, String instant
     }
     // update Metadata table
     writeTableMetadata(table, instantTime, commitActionType, metadata);
+    context.setJobStatus(this.getClass().getSimpleName(),"Completing commit in 
table " + config.getTableName());

Review Comment:
   can we also add the instantTime. So we will know what completed



##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/metadata/SparkHoodieBackedTableMetadataWriter.java:
##########
@@ -133,6 +133,7 @@ protected void commit(String instantTime, 
Map<MetadataPartitionType, HoodieData<
     HoodieData<HoodieRecord> preppedRecords = prepRecords(partitionRecordsMap);
     JavaRDD<HoodieRecord> preppedRecordRDD = 
HoodieJavaRDD.getJavaRDD(preppedRecords);
 
+    engineContext.setJobStatus(this.getClass().getName(), "Committing to 
metadata table " + metadataWriteConfig.getTableName());

Review Comment:
   add instantTime



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -1086,39 +1086,44 @@ private void initialCommit(String createInstantTime, 
List<MetadataPartitionType>
 
     Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionToRecordsMap 
= new HashMap<>();
 
-    List<DirectoryInfo> partitionInfoList = listAllPartitions(dataMetaClient);
-    Map<String, Map<String, Long>> partitionToFilesMap = 
partitionInfoList.stream()
-        .map(p -> {
-          String partitionName = 
HoodieTableMetadataUtil.getPartitionIdentifier(p.getRelativePath());
-          return Pair.of(partitionName, p.getFileNameToSizeMap());
-        })
-        .collect(Collectors.toMap(Pair::getKey, Pair::getValue));
-
-    int totalDataFilesCount = 
partitionToFilesMap.values().stream().mapToInt(Map::size).sum();
-    List<String> partitions = new ArrayList<>(partitionToFilesMap.keySet());
-
-    if (partitionTypes.contains(MetadataPartitionType.FILES)) {
-      // Record which saves the list of all partitions
-      HoodieRecord allPartitionRecord = 
HoodieMetadataPayload.createPartitionListRecord(partitions);
-      HoodieData<HoodieRecord> filesPartitionRecords = 
getFilesPartitionRecords(createInstantTime, partitionInfoList, 
allPartitionRecord);
-      ValidationUtils.checkState(filesPartitionRecords.count() == 
(partitions.size() + 1));
-      partitionToRecordsMap.put(MetadataPartitionType.FILES, 
filesPartitionRecords);
-    }
+    // skip parsing file system for metadata records if its first commit for 
the table.

Review Comment:
   this comment is not clear. whats "parsing" and file system mean? is it 
listing the file system? is that what you want to say



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -1086,39 +1086,44 @@ private void initialCommit(String createInstantTime, 
List<MetadataPartitionType>
 
     Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionToRecordsMap 
= new HashMap<>();
 
-    List<DirectoryInfo> partitionInfoList = listAllPartitions(dataMetaClient);
-    Map<String, Map<String, Long>> partitionToFilesMap = 
partitionInfoList.stream()
-        .map(p -> {
-          String partitionName = 
HoodieTableMetadataUtil.getPartitionIdentifier(p.getRelativePath());
-          return Pair.of(partitionName, p.getFileNameToSizeMap());
-        })
-        .collect(Collectors.toMap(Pair::getKey, Pair::getValue));
-
-    int totalDataFilesCount = 
partitionToFilesMap.values().stream().mapToInt(Map::size).sum();
-    List<String> partitions = new ArrayList<>(partitionToFilesMap.keySet());
-
-    if (partitionTypes.contains(MetadataPartitionType.FILES)) {
-      // Record which saves the list of all partitions
-      HoodieRecord allPartitionRecord = 
HoodieMetadataPayload.createPartitionListRecord(partitions);
-      HoodieData<HoodieRecord> filesPartitionRecords = 
getFilesPartitionRecords(createInstantTime, partitionInfoList, 
allPartitionRecord);
-      ValidationUtils.checkState(filesPartitionRecords.count() == 
(partitions.size() + 1));
-      partitionToRecordsMap.put(MetadataPartitionType.FILES, 
filesPartitionRecords);
-    }
+    // skip parsing file system for metadata records if its first commit for 
the table.
+    if (dataMetaClient.getActiveTimeline().getWriteTimeline().countInstants() 
> 1
+        || 
dataMetaClient.getActiveTimeline().getWriteTimeline().filterCompletedInstants().countInstants()
 != 0) {
+      List<DirectoryInfo> partitionInfoList = 
listAllPartitions(dataMetaClient);

Review Comment:
   +1 is there a cleaner way of checking this? this feels a bit obtuse to 
derive that this is the first commit into the MDT.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to