[ https://issues.apache.org/jira/browse/FLINK-35787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Roman Khachatryan updated FLINK-35787: -------------------------------------- Component/s: Runtime / Coordination > DefaultSlotStatusSyncer might bring down JVM (exit code 239 instead of a > proper shutdown) > ----------------------------------------------------------------------------------------- > > Key: FLINK-35787 > URL: https://issues.apache.org/jira/browse/FLINK-35787 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination > Reporter: Roman Khachatryan > Priority: Major > > In our internal CI, I've encountered the following error: > {code:java} > * 12:02:47,205 [ pool-126-thread-1] ERROR > org.apache.flink.util.FatalExitExceptionHandler [] - FATAL: > Thread 'pool-126-thread-1' produced an uncaught exception. Stopping the > process... > java.util.concurrent.CompletionException: > java.util.concurrent.RejectedExecutionException: Task > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@38ce013a[Not > completed, task = > java.util.concurrent.Executors$RunnableAdapter@640a9cf7[Wrapped task = > java.util.concurrent.> > at > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314) > ~[?:?] > at > java.util.concurrent.CompletableFuture.uniHandleStage(CompletableFuture.java:951) > ~[?:?] > at > java.util.concurrent.CompletableFuture.handleAsync(CompletableFuture.java:2282) > ~[?:?] > at > org.apache.flink.runtime.resourcemanager.slotmanager.DefaultSlotStatusSyncer.allocateSlot(DefaultSlotStatusSyncer.java:138) > ~[classes/:?] > at > org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.allocateSlotsAccordingTo(FineGrainedSlotManager.java:722) > ~[classes/:?] > at > org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.checkResourceRequirements(FineGrainedSlotManager.java:645) > ~[classes/:?] > at > org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.lambda$checkResourceRequirementsWithDelay$12(FineGrainedSlotManager.java:603) > ~[classes/:?] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?] > at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) > [?:?] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > [?:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > [?:?] > at java.lang.Thread.run(Thread.java:829) [?:?] > Caused by: java.util.concurrent.RejectedExecutionException: Task > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@38ce013a[Not > completed, task = > java.util.concurrent.Executors$RunnableAdapter@640a9cf7[Wrapped task = > java.util.concurrent.CompletableFuture$UniHandle@f3d> > at > java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2055) > ~[?:?] > at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:825) > ~[?:?] > at > java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:340) > ~[?:?] > at > java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:562) > ~[?:?] > at > java.util.concurrent.ScheduledThreadPoolExecutor.execute(ScheduledThreadPoolExecutor.java:705) > ~[?:?] > at > java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:687) > ~[?:?] > at > java.util.concurrent.CompletableFuture.uniHandleStage(CompletableFuture.java:949) > ~[?:?] > ... 11 more{code} > From the code, it looks like RM main thread executor was shut down, and that > triggered JVM exit: > {code:java} > CompletableFuture<Acknowledge> requestFuture = > gateway.requestSlot( > SlotID.getDynamicSlotID(resourceId), > jobId, > allocationId, > resourceProfile, > targetAddress, > resourceManagerId, > taskManagerRequestTimeout); > CompletableFuture<Void> returnedFuture = new CompletableFuture<>(); > FutureUtils.assertNoException( > requestFuture.handleAsync( > (Acknowledge acknowledge, Throwable throwable) -> { > ... }, > mainThreadExecutor)); > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)