codope commented on code in PR #4118:
URL: https://github.com/apache/hudi/pull/4118#discussion_r844982092


##########
hudi-common/src/main/java/org/apache/hudi/common/util/ClusteringUtils.java:
##########
@@ -124,7 +125,16 @@
         // get all filegroups in the plan
         getFileGroupEntriesInClusteringPlan(clusteringPlan.getLeft(), 
clusteringPlan.getRight()));
 
-    Map<HoodieFileGroupId, HoodieInstant> resultMap = 
resultStream.collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));
+    Map<HoodieFileGroupId, HoodieInstant> resultMap;
+    try {
+      resultMap = resultStream.collect(Collectors.toMap(Map.Entry::getKey, 
Map.Entry::getValue));
+    } catch (Exception e) {
+      if (e instanceof IllegalStateException && 
e.getMessage().contains("Duplicate key")) {
+        throw new HoodieException("Found duplicate file groups pending 
clustering. If you're running deltastreamer in continuous mode, consider adding 
delay using --min-sync-interval-seconds. "

Review Comment:
   anyway, now we have OCC with in process lock provider when metadata is 
enabled and users just need to set one config to adjust concurrency mode in 
case of deltastreamer/spark streaming: 
`HoodieWriteConfig#AUTO_ADJUST_LOCK_CONFIGS`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to