vinothchandar commented on code in PR #7841:
URL: https://github.com/apache/hudi/pull/7841#discussion_r1096047641
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java:
##########
@@ -280,6 +280,7 @@ protected void commit(HoodieTable table, String
commitActionType, String instant
}
// update Metadata table
writeTableMetadata(table, instantTime, commitActionType, metadata);
+ context.setJobStatus(this.getClass().getSimpleName(),"Completing commit in
table " + config.getTableName());
Review Comment:
can we also add the instantTime. So we will know what completed
##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/metadata/SparkHoodieBackedTableMetadataWriter.java:
##########
@@ -133,6 +133,7 @@ protected void commit(String instantTime,
Map<MetadataPartitionType, HoodieData<
HoodieData<HoodieRecord> preppedRecords = prepRecords(partitionRecordsMap);
JavaRDD<HoodieRecord> preppedRecordRDD =
HoodieJavaRDD.getJavaRDD(preppedRecords);
+ engineContext.setJobStatus(this.getClass().getName(), "Committing to
metadata table " + metadataWriteConfig.getTableName());
Review Comment:
add instantTime
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -1086,39 +1086,44 @@ private void initialCommit(String createInstantTime,
List<MetadataPartitionType>
Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionToRecordsMap
= new HashMap<>();
- List<DirectoryInfo> partitionInfoList = listAllPartitions(dataMetaClient);
- Map<String, Map<String, Long>> partitionToFilesMap =
partitionInfoList.stream()
- .map(p -> {
- String partitionName =
HoodieTableMetadataUtil.getPartitionIdentifier(p.getRelativePath());
- return Pair.of(partitionName, p.getFileNameToSizeMap());
- })
- .collect(Collectors.toMap(Pair::getKey, Pair::getValue));
-
- int totalDataFilesCount =
partitionToFilesMap.values().stream().mapToInt(Map::size).sum();
- List<String> partitions = new ArrayList<>(partitionToFilesMap.keySet());
-
- if (partitionTypes.contains(MetadataPartitionType.FILES)) {
- // Record which saves the list of all partitions
- HoodieRecord allPartitionRecord =
HoodieMetadataPayload.createPartitionListRecord(partitions);
- HoodieData<HoodieRecord> filesPartitionRecords =
getFilesPartitionRecords(createInstantTime, partitionInfoList,
allPartitionRecord);
- ValidationUtils.checkState(filesPartitionRecords.count() ==
(partitions.size() + 1));
- partitionToRecordsMap.put(MetadataPartitionType.FILES,
filesPartitionRecords);
- }
+ // skip parsing file system for metadata records if its first commit for
the table.
Review Comment:
this comment is not clear. whats "parsing" and file system mean? is it
listing the file system? is that what you want to say
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -1086,39 +1086,44 @@ private void initialCommit(String createInstantTime,
List<MetadataPartitionType>
Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionToRecordsMap
= new HashMap<>();
- List<DirectoryInfo> partitionInfoList = listAllPartitions(dataMetaClient);
- Map<String, Map<String, Long>> partitionToFilesMap =
partitionInfoList.stream()
- .map(p -> {
- String partitionName =
HoodieTableMetadataUtil.getPartitionIdentifier(p.getRelativePath());
- return Pair.of(partitionName, p.getFileNameToSizeMap());
- })
- .collect(Collectors.toMap(Pair::getKey, Pair::getValue));
-
- int totalDataFilesCount =
partitionToFilesMap.values().stream().mapToInt(Map::size).sum();
- List<String> partitions = new ArrayList<>(partitionToFilesMap.keySet());
-
- if (partitionTypes.contains(MetadataPartitionType.FILES)) {
- // Record which saves the list of all partitions
- HoodieRecord allPartitionRecord =
HoodieMetadataPayload.createPartitionListRecord(partitions);
- HoodieData<HoodieRecord> filesPartitionRecords =
getFilesPartitionRecords(createInstantTime, partitionInfoList,
allPartitionRecord);
- ValidationUtils.checkState(filesPartitionRecords.count() ==
(partitions.size() + 1));
- partitionToRecordsMap.put(MetadataPartitionType.FILES,
filesPartitionRecords);
- }
+ // skip parsing file system for metadata records if its first commit for
the table.
+ if (dataMetaClient.getActiveTimeline().getWriteTimeline().countInstants()
> 1
+ ||
dataMetaClient.getActiveTimeline().getWriteTimeline().filterCompletedInstants().countInstants()
!= 0) {
+ List<DirectoryInfo> partitionInfoList =
listAllPartitions(dataMetaClient);
Review Comment:
+1 is there a cleaner way of checking this? this feels a bit obtuse to
derive that this is the first commit into the MDT.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]