Re: [PR] [HUDI-9371] Fix conflict resolution handling of inflight clustering [hudi]

via GitHub Wed, 07 May 2025 20:52:06 -0700


danny0405 commented on code in PR #13255:
URL: https://github.com/apache/hudi/pull/13255#discussion_r2078837214



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/transaction/SimpleConcurrentFileWritesConflictResolutionStrategy.java:
##########
@@ -67,8 +68,7 @@ public Stream<HoodieInstant> 
getCandidateInstants(HoodieTableMetaClient metaClie
     Stream<HoodieInstant> compactionAndClusteringPendingTimeline = 
activeTimeline
         .filterPendingReplaceClusteringAndCompactionTimeline()
         .filter(instant -> ClusteringUtils.isClusteringInstant(activeTimeline, 
instant, metaClient.getInstantGenerator())
-            || HoodieTimeline.COMPACTION_ACTION.equals(instant.getAction()))
-        .findInstantsAfter(currentInstant.requestedTime())
+            || (!HoodieTimeline.CLUSTERING_ACTION.equals(instant.getAction()) 
&& compareTimestamps(instant.requestedTime(), GREATER_THAN, 
currentInstant.requestedTime())))

Review Comment:
   `(!HoodieTimeline.CLUSTERING_ACTION.equals(instant.getAction())` seems 
unnecessary because line 70 aleady includes all the clustering commits.
   
   > This prevents you from accidentally skipping over a clustering commit that 
started before this commit.
   
   For consistent hashing write, we do dual write to the replaced file group to 
keep visibility, what is the reqular writer behaviors now, does the writer sill 
write to a replaced file group id though(if there is a pending clustering plan)?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-9371] Fix conflict resolution handling of inflight clustering [hudi]

Reply via email to