[ 
https://issues.apache.org/jira/browse/FLINK-35787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Khachatryan updated FLINK-35787:
--------------------------------------
    Affects Version/s: 1.19.1

> DefaultSlotStatusSyncer might bring down JVM (exit code 239 instead of a 
> proper shutdown)
> -----------------------------------------------------------------------------------------
>
>                 Key: FLINK-35787
>                 URL: https://issues.apache.org/jira/browse/FLINK-35787
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.19.1
>            Reporter: Roman Khachatryan
>            Priority: Major
>
> In our internal CI, I've encountered the following error:
> {code:java}
> * 12:02:47,205 [   pool-126-thread-1] ERROR 
> org.apache.flink.util.FatalExitExceptionHandler              [] - FATAL: 
> Thread 'pool-126-thread-1' produced an uncaught exception. Stopping the 
> process...
>   java.util.concurrent.CompletionException: 
> java.util.concurrent.RejectedExecutionException: Task 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@38ce013a[Not
>  completed, task = 
> java.util.concurrent.Executors$RunnableAdapter@640a9cf7[Wrapped task = 
> java.util.concurrent.>
>           at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314)
>  ~[?:?]
>           at 
> java.util.concurrent.CompletableFuture.uniHandleStage(CompletableFuture.java:951)
>  ~[?:?]
>           at 
> java.util.concurrent.CompletableFuture.handleAsync(CompletableFuture.java:2282)
>  ~[?:?]
>           at 
> org.apache.flink.runtime.resourcemanager.slotmanager.DefaultSlotStatusSyncer.allocateSlot(DefaultSlotStatusSyncer.java:138)
>  ~[classes/:?]
>           at 
> org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.allocateSlotsAccordingTo(FineGrainedSlotManager.java:722)
>  ~[classes/:?]
>           at 
> org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.checkResourceRequirements(FineGrainedSlotManager.java:645)
>  ~[classes/:?]
>           at 
> org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.lambda$checkResourceRequirementsWithDelay$12(FineGrainedSlotManager.java:603)
>  ~[classes/:?]
>           at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
>           at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
>           at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
>  [?:?]
>           at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>           at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
>           at java.lang.Thread.run(Thread.java:829) [?:?]
>   Caused by: java.util.concurrent.RejectedExecutionException: Task 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@38ce013a[Not
>  completed, task = 
> java.util.concurrent.Executors$RunnableAdapter@640a9cf7[Wrapped task = 
> java.util.concurrent.CompletableFuture$UniHandle@f3d>
>           at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2055)
>  ~[?:?]
>           at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:825) 
> ~[?:?]
>           at 
> java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:340)
>  ~[?:?]
>           at 
> java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:562)
>  ~[?:?]
>           at 
> java.util.concurrent.ScheduledThreadPoolExecutor.execute(ScheduledThreadPoolExecutor.java:705)
>  ~[?:?]
>           at 
> java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:687)
>  ~[?:?]
>           at 
> java.util.concurrent.CompletableFuture.uniHandleStage(CompletableFuture.java:949)
>  ~[?:?]
>           ... 11 more{code}
> From the code, it looks like RM main thread executor was shut down, and that 
> triggered JVM exit:
> {code:java}
>         CompletableFuture<Acknowledge> requestFuture =
>                 gateway.requestSlot(
>                         SlotID.getDynamicSlotID(resourceId),
>                         jobId,
>                         allocationId,
>                         resourceProfile,
>                         targetAddress,
>                         resourceManagerId,
>                         taskManagerRequestTimeout);        
> CompletableFuture<Void> returnedFuture = new CompletableFuture<>();        
> FutureUtils.assertNoException(
>                 requestFuture.handleAsync(
>                         (Acknowledge acknowledge, Throwable throwable) -> { 
> ... },
>                         mainThreadExecutor));
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to