yihua commented on a change in pull request #3857:
URL: https://github.com/apache/hudi/pull/3857#discussion_r749735457



##########
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java
##########
@@ -206,11 +205,25 @@ public MultipleSparkJobExecutionStrategy(HoodieTable 
table, HoodieEngineContext
               .build();
 
           HoodieTableConfig tableConfig = 
table.getMetaClient().getTableConfig();
-          recordIterators.add(getFileSliceReader(baseFileReader, scanner, 
readerSchema,
-              tableConfig.getPayloadClass(),
-              tableConfig.getPreCombineField(),
-              tableConfig.populateMetaFields() ? Option.empty() : 
Option.of(Pair.of(tableConfig.getRecordKeyFieldProp(),
-                  tableConfig.getPartitionFieldProp()))));
+          if (!StringUtils.isNullOrEmpty(clusteringOp.getDataFilePath())) {
+            HoodieFileReader<? extends IndexedRecord> baseFileReader = 
HoodieFileReaderFactory.getFileReader(table.getHadoopConf(), new 
Path(clusteringOp.getDataFilePath()));
+            
recordIterators.add(HoodieFileSliceReader.getFileSliceReader(baseFileReader, 
scanner, readerSchema,
+                tableConfig.getPayloadClass(),
+                tableConfig.getPreCombineField(),
+                tableConfig.populateMetaFields() ? Option.empty() : 
Option.of(Pair.of(tableConfig.getRecordKeyFieldProp(),
+                    tableConfig.getPartitionFieldProp()))));
+          } else {
+            // Since there is no base file, fall back to reading log files
+            Iterable<HoodieRecord<? extends HoodieRecordPayload>> iterable = 
() -> scanner.iterator();
+            recordIterators.add(StreamSupport.stream(iterable.spliterator(), 
false)
+                .map(e -> {
+                  try {
+                    return transform((IndexedRecord) 
e.getData().getInsertValue(readerSchema).get());
+                  } catch (IOException io) {
+                    throw new UncheckedIOException(io);
+                  }
+                }).iterator());
+          }

Review comment:
       These are changes from @codope 's fix for clustering and should be 
removed once rebased.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to