alexeykudinkin commented on a change in pull request #4178: URL: https://github.com/apache/hudi/pull/4178#discussion_r766089204
########## File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java ########## @@ -95,8 +95,7 @@ public MultipleSparkJobExecutionStrategy(HoodieTable table, HoodieEngineContext .map(inputGroup -> runClusteringForGroupAsync(inputGroup, clusteringPlan.getStrategy().getStrategyParams(), Option.ofNullable(clusteringPlan.getPreserveHoodieMetadata()).orElse(false), - instantTime)) - .map(CompletableFuture::join); + instantTime)).collect(Collectors.toList()).stream().map(CompletableFuture::join); Review comment: @xiarixiaoyao I see where we diverge now: you're relying on stream parallelism which is very implicit behavior, and as you might have noticed -- stream parallelism is not guaranteed (some operations in the current JDK implementation are actually implicitly converting the stream into "sequential" one from parallel w/o you even knowing it). There's also another angle, that joining Futures w/in Streams will be blocking threads of `ForkJoinPool.common` executor. Hence is my proposal to instead rely on _explicit_ conversion using `allOf` and subsequent join. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org