codope commented on code in PR #4118: URL: https://github.com/apache/hudi/pull/4118#discussion_r844982092
########## hudi-common/src/main/java/org/apache/hudi/common/util/ClusteringUtils.java: ########## @@ -124,7 +125,16 @@ // get all filegroups in the plan getFileGroupEntriesInClusteringPlan(clusteringPlan.getLeft(), clusteringPlan.getRight())); - Map<HoodieFileGroupId, HoodieInstant> resultMap = resultStream.collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue)); + Map<HoodieFileGroupId, HoodieInstant> resultMap; + try { + resultMap = resultStream.collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue)); + } catch (Exception e) { + if (e instanceof IllegalStateException && e.getMessage().contains("Duplicate key")) { + throw new HoodieException("Found duplicate file groups pending clustering. If you're running deltastreamer in continuous mode, consider adding delay using --min-sync-interval-seconds. " Review Comment: anyway, now we have OCC with in process lock provider when metadata is enabled and users just need to set one config to adjust concurrency mode in case of deltastreamer/spark streaming: `HoodieWriteConfig#AUTO_ADJUST_LOCK_CONFIGS` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org