alexeykudinkin commented on a change in pull request #4178:
URL: https://github.com/apache/hudi/pull/4178#discussion_r766089204



##########
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java
##########
@@ -95,8 +95,7 @@ public MultipleSparkJobExecutionStrategy(HoodieTable table, 
HoodieEngineContext
         .map(inputGroup -> runClusteringForGroupAsync(inputGroup,
             clusteringPlan.getStrategy().getStrategyParams(),
             
Option.ofNullable(clusteringPlan.getPreserveHoodieMetadata()).orElse(false),
-            instantTime))
-        .map(CompletableFuture::join);
+            
instantTime)).collect(Collectors.toList()).stream().map(CompletableFuture::join);

Review comment:
       @xiarixiaoyao I see where we diverge now: you're relying on stream 
parallelism which is very implicit behavior, and as you might have noticed -- 
stream parallelism is not guaranteed (some operations in the current JDK 
implementation are actually implicitly converting the stream into "sequential" 
one from parallel w/o you even knowing it). 
   
   There's also another angle, that joining Futures w/in Streams will be 
blocking threads of `ForkJoinPool.common` executor.
   
   Hence is my proposal to instead rely on _explicit_ conversion using `allOf` 
and subsequent join.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to