zhangyue19921010 commented on code in PR #13674:
URL: https://github.com/apache/hudi/pull/13674#discussion_r2281784748


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/cluster/strategy/PartitionAwareClusteringPlanStrategy.java:
##########
@@ -76,17 +77,31 @@ protected Pair<Stream<HoodieClusteringGroup>, Boolean> 
buildClusteringGroupsForP
 
     long totalSizeSoFar = 0;
     boolean partialScheduled = false;
+    
+    // Only group by schema if schema evolution is disabled

Review Comment:
   Is that possible that we create a new clustering plan(maybe named as ), not 
change any codes in `PartitionAwareClusteringPlanStrategy` , also pay attention 
to this new clustering plan strategy :
   1. Name as `SparkStreamCopyClusteringPlanStrategy`
   2. Set `SparkStreamCopyClusteringExecutionStrategy` when build clustering 
plan object
   3. if schema.evolution.enable false, 
SparkStreamCopyClusteringExecutionStrategy need to skip schema-check related 
logic in `supportBinaryStreamCopy`



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieBinaryCopyHandle.java:
##########
@@ -63,6 +64,26 @@ public class HoodieBinaryCopyHandle<T, I, K, O> extends 
HoodieWriteHandle<T, I,
   protected long recordsWritten = 0;
   protected long insertRecordsWritten = 0;
 
+  private MessageType getWriteSchema(HoodieWriteConfig config, 
List<StoragePath> inputFiles, Configuration conf, HoodieTable<?, ?, ?, ?> 
table) {
+    if (!config.isFileStitchingBinaryCopySchemaEvolutionEnabled() && 
!inputFiles.isEmpty()) {

Review Comment:
   Also `isFileStitchingBinaryCopySchemaEvolutionEnabled` is too long, maybe 
reanme to `isBinaryCopySchemaEvolutionEnabled`



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieClusteringConfig.java:
##########
@@ -343,6 +343,16 @@ public class HoodieClusteringConfig extends HoodieConfig {
           + "Please exercise caution while setting this config, especially 
when clustering is done very frequently. This could lead to race condition in "
           + "rare scenarios, for example, when the clustering completes after 
instants are fetched but before rollback completed.");
 
+  public static final ConfigProperty<Boolean> 
FILE_STITCHING_BINARY_COPY_SCHEMA_EVOLUTION_ENABLE = ConfigProperty
+      .key(CLUSTERING_STRATEGY_PARAM_PREFIX + 
"file.stitching.binary.copy.schema.evolution.enable")
+      .defaultValue(false)

Review Comment:
   completed config name here is 
`hoodie.clustering.plan.strategy.file.stitching.binary.copy.schema.evolution.enable`
  (also typo at stitching)
   it's too long, maybe we can change it to `CLUSTERING_STRATEGY_PARAM_PREFIX + 
binary.copy.schema.evolution.enable` 
(`hoodie.clustering.plan.strategy.binary.copy.schema.evolution.enable` )



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to