the-other-tim-brown commented on code in PR #13313:
URL: https://github.com/apache/hudi/pull/13313#discussion_r2102633880


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -762,21 +742,38 @@ private static HoodieData<HoodieRecord> 
readRecordKeysFromFileSliceSnapshot(Hood
         .filterCompletedInstants()
         .lastInstant()
         .map(HoodieInstant::requestedTime);
+    if (!instantTime.isPresent()) {
+      return engineContext.emptyHoodieData();
+    }
 
     engineContext.setJobStatus(activeModule, "Record Index: reading record 
keys from " + partitionFileSlicePairs.size() + " file slices");
     final int parallelism = Math.min(partitionFileSlicePairs.size(), 
recordIndexMaxParallelism);
-
+    ReaderContextFactory<T> readerContextFactory = 
engineContext.getReaderContextFactory(metaClient);
     return engineContext.parallelize(partitionFileSlicePairs, 
parallelism).flatMap(partitionAndFileSlice -> {
-
       final String partition = partitionAndFileSlice.getKey();
       final FileSlice fileSlice = partitionAndFileSlice.getValue();
       final String fileId = fileSlice.getFileId();
-      return new HoodieMergedReadHandle(dataWriteConfig, instantTime, 
hoodieTable, Pair.of(partition, fileSlice.getFileId()),
-          Option.of(fileSlice)).getMergedRecords().stream().map(record -> {
-            HoodieRecord record1 = (HoodieRecord) record;
-            return 
HoodieMetadataPayload.createRecordIndexUpdate(record1.getRecordKey(), 
partition, fileId,
-            record1.getCurrentLocation().getInstantTime(), 0);
-          }).iterator();
+      HoodieReaderContext<T> readerContext = readerContextFactory.getContext();
+      Schema dataSchema = HoodieAvroUtils.addMetadataFields(new 
Schema.Parser().parse(dataWriteConfig.getWriteSchema()), 
dataWriteConfig.allowOperationMetadataField());
+      Schema requestedSchema = 
metaClient.getTableConfig().populateMetaFields() ? getRecordKeySchema() : 
dataSchema;

Review Comment:
   The current behavior is that we read the entire record. If there is an easy 
way to compute the schema for all the record key fields, I can update to use 
that but otherwise I think that makes more sense as a follow up performance 
improvement. This code is only used to initialize the Record Level Index so it 
should be a one time operation for the table.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to