[ 
https://issues.apache.org/jira/browse/FLINK-35787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Khachatryan updated FLINK-35787:
--------------------------------------
    Description: 
In our internal CI, I've encountered the following error:
{code:java}
* 12:02:47,205 [   pool-126-thread-1] ERROR 
org.apache.flink.util.FatalExitExceptionHandler              [] - FATAL: Thread 
'pool-126-thread-1' produced an uncaught exception. Stopping the process...
  java.util.concurrent.CompletionException: 
java.util.concurrent.RejectedExecutionException: Task 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@38ce013a[Not
 completed, task = 
java.util.concurrent.Executors$RunnableAdapter@640a9cf7[Wrapped task = 
java.util.concurrent.>
          at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314)
 ~[?:?]
          at 
java.util.concurrent.CompletableFuture.uniHandleStage(CompletableFuture.java:951)
 ~[?:?]
          at 
java.util.concurrent.CompletableFuture.handleAsync(CompletableFuture.java:2282) 
~[?:?]
          at 
org.apache.flink.runtime.resourcemanager.slotmanager.DefaultSlotStatusSyncer.allocateSlot(DefaultSlotStatusSyncer.java:138)
 ~[classes/:?]
          at 
org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.allocateSlotsAccordingTo(FineGrainedSlotManager.java:722)
 ~[classes/:?]
          at 
org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.checkResourceRequirements(FineGrainedSlotManager.java:645)
 ~[classes/:?]
          at 
org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.lambda$checkResourceRequirementsWithDelay$12(FineGrainedSlotManager.java:603)
 ~[classes/:?]
          at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
          at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
          at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
 [?:?]
          at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
[?:?]
          at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
[?:?]
          at java.lang.Thread.run(Thread.java:829) [?:?]
  Caused by: java.util.concurrent.RejectedExecutionException: Task 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@38ce013a[Not
 completed, task = 
java.util.concurrent.Executors$RunnableAdapter@640a9cf7[Wrapped task = 
java.util.concurrent.CompletableFuture$UniHandle@f3d>
          at 
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2055)
 ~[?:?]
          at 
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:825) 
~[?:?]
          at 
java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:340)
 ~[?:?]
          at 
java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:562)
 ~[?:?]
          at 
java.util.concurrent.ScheduledThreadPoolExecutor.execute(ScheduledThreadPoolExecutor.java:705)
 ~[?:?]
          at 
java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:687)
 ~[?:?]
          at 
java.util.concurrent.CompletableFuture.uniHandleStage(CompletableFuture.java:949)
 ~[?:?]
          ... 11 more{code}
[From the 
code|https://github.com/apache/flink/blob/fa96ed209a7753a3fe46f93288857e9526c4a7ca/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/DefaultSlotStatusSyncer.java#L137],
 it looks like RM main thread executor was shut down, and that triggered JVM 
exit:
{code:java}
CompletableFuture<Acknowledge> requestFuture =
          gateway.requestSlot(
                  SlotID.getDynamicSlotID(resourceId),
                  jobId,
                  allocationId,
                  resourceProfile,
                  targetAddress,
                  resourceManagerId,
                  taskManagerRequestTimeout);        
CompletableFuture<Void> returnedFuture = new CompletableFuture<>();        
FutureUtils.assertNoException(
          requestFuture.handleAsync(
                (Acknowledge acknowledge, Throwable throwable) -> { ... },
                mainThreadExecutor));{code}
 

  was:
In our internal CI, I've encountered the following error:
{code:java}
* 12:02:47,205 [   pool-126-thread-1] ERROR 
org.apache.flink.util.FatalExitExceptionHandler              [] - FATAL: Thread 
'pool-126-thread-1' produced an uncaught exception. Stopping the process...
  java.util.concurrent.CompletionException: 
java.util.concurrent.RejectedExecutionException: Task 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@38ce013a[Not
 completed, task = 
java.util.concurrent.Executors$RunnableAdapter@640a9cf7[Wrapped task = 
java.util.concurrent.>
          at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314)
 ~[?:?]
          at 
java.util.concurrent.CompletableFuture.uniHandleStage(CompletableFuture.java:951)
 ~[?:?]
          at 
java.util.concurrent.CompletableFuture.handleAsync(CompletableFuture.java:2282) 
~[?:?]
          at 
org.apache.flink.runtime.resourcemanager.slotmanager.DefaultSlotStatusSyncer.allocateSlot(DefaultSlotStatusSyncer.java:138)
 ~[classes/:?]
          at 
org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.allocateSlotsAccordingTo(FineGrainedSlotManager.java:722)
 ~[classes/:?]
          at 
org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.checkResourceRequirements(FineGrainedSlotManager.java:645)
 ~[classes/:?]
          at 
org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.lambda$checkResourceRequirementsWithDelay$12(FineGrainedSlotManager.java:603)
 ~[classes/:?]
          at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
          at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
          at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
 [?:?]
          at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
[?:?]
          at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
[?:?]
          at java.lang.Thread.run(Thread.java:829) [?:?]
  Caused by: java.util.concurrent.RejectedExecutionException: Task 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@38ce013a[Not
 completed, task = 
java.util.concurrent.Executors$RunnableAdapter@640a9cf7[Wrapped task = 
java.util.concurrent.CompletableFuture$UniHandle@f3d>
          at 
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2055)
 ~[?:?]
          at 
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:825) 
~[?:?]
          at 
java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:340)
 ~[?:?]
          at 
java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:562)
 ~[?:?]
          at 
java.util.concurrent.ScheduledThreadPoolExecutor.execute(ScheduledThreadPoolExecutor.java:705)
 ~[?:?]
          at 
java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:687)
 ~[?:?]
          at 
java.util.concurrent.CompletableFuture.uniHandleStage(CompletableFuture.java:949)
 ~[?:?]
          ... 11 more{code}
[From the 
code|https://github.com/apache/flink/blob/fa96ed209a7753a3fe46f93288857e9526c4a7ca/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/DefaultSlotStatusSyncer.java#L137],
 it looks like RM main thread executor was shut down, and that triggered JVM 
exit:
{code:java}
CompletableFuture<Acknowledge> requestFuture =
          gateway.requestSlot(
                  SlotID.getDynamicSlotID(resourceId),
                  jobId,
                  allocationId,
                  resourceProfile,
                  targetAddress,
                  resourceManagerId,
                  taskManagerRequestTimeout);        CompletableFuture<Void> 
returnedFuture = new CompletableFuture<>();        
FutureUtils.assertNoException(
          requestFuture.handleAsync(
                (Acknowledge acknowledge, Throwable throwable) -> { ... },
                mainThreadExecutor));{code}
 


> DefaultSlotStatusSyncer might bring down JVM (exit code 239 instead of a 
> proper shutdown)
> -----------------------------------------------------------------------------------------
>
>                 Key: FLINK-35787
>                 URL: https://issues.apache.org/jira/browse/FLINK-35787
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.19.1
>            Reporter: Roman Khachatryan
>            Priority: Major
>
> In our internal CI, I've encountered the following error:
> {code:java}
> * 12:02:47,205 [   pool-126-thread-1] ERROR 
> org.apache.flink.util.FatalExitExceptionHandler              [] - FATAL: 
> Thread 'pool-126-thread-1' produced an uncaught exception. Stopping the 
> process...
>   java.util.concurrent.CompletionException: 
> java.util.concurrent.RejectedExecutionException: Task 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@38ce013a[Not
>  completed, task = 
> java.util.concurrent.Executors$RunnableAdapter@640a9cf7[Wrapped task = 
> java.util.concurrent.>
>           at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314)
>  ~[?:?]
>           at 
> java.util.concurrent.CompletableFuture.uniHandleStage(CompletableFuture.java:951)
>  ~[?:?]
>           at 
> java.util.concurrent.CompletableFuture.handleAsync(CompletableFuture.java:2282)
>  ~[?:?]
>           at 
> org.apache.flink.runtime.resourcemanager.slotmanager.DefaultSlotStatusSyncer.allocateSlot(DefaultSlotStatusSyncer.java:138)
>  ~[classes/:?]
>           at 
> org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.allocateSlotsAccordingTo(FineGrainedSlotManager.java:722)
>  ~[classes/:?]
>           at 
> org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.checkResourceRequirements(FineGrainedSlotManager.java:645)
>  ~[classes/:?]
>           at 
> org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.lambda$checkResourceRequirementsWithDelay$12(FineGrainedSlotManager.java:603)
>  ~[classes/:?]
>           at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
>           at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
>           at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
>  [?:?]
>           at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>           at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
>           at java.lang.Thread.run(Thread.java:829) [?:?]
>   Caused by: java.util.concurrent.RejectedExecutionException: Task 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@38ce013a[Not
>  completed, task = 
> java.util.concurrent.Executors$RunnableAdapter@640a9cf7[Wrapped task = 
> java.util.concurrent.CompletableFuture$UniHandle@f3d>
>           at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2055)
>  ~[?:?]
>           at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:825) 
> ~[?:?]
>           at 
> java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:340)
>  ~[?:?]
>           at 
> java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:562)
>  ~[?:?]
>           at 
> java.util.concurrent.ScheduledThreadPoolExecutor.execute(ScheduledThreadPoolExecutor.java:705)
>  ~[?:?]
>           at 
> java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:687)
>  ~[?:?]
>           at 
> java.util.concurrent.CompletableFuture.uniHandleStage(CompletableFuture.java:949)
>  ~[?:?]
>           ... 11 more{code}
> [From the 
> code|https://github.com/apache/flink/blob/fa96ed209a7753a3fe46f93288857e9526c4a7ca/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/DefaultSlotStatusSyncer.java#L137],
>  it looks like RM main thread executor was shut down, and that triggered JVM 
> exit:
> {code:java}
> CompletableFuture<Acknowledge> requestFuture =
>           gateway.requestSlot(
>                   SlotID.getDynamicSlotID(resourceId),
>                   jobId,
>                   allocationId,
>                   resourceProfile,
>                   targetAddress,
>                   resourceManagerId,
>                   taskManagerRequestTimeout);        
> CompletableFuture<Void> returnedFuture = new CompletableFuture<>();        
> FutureUtils.assertNoException(
>           requestFuture.handleAsync(
>                 (Acknowledge acknowledge, Throwable throwable) -> { ... },
>                 mainThreadExecutor));{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to