[ https://issues.apache.org/jira/browse/FLINK-36295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882688#comment-17882688 ]
Matthias Pohl commented on FLINK-36295: --------------------------------------- {quote}There is a strange observation in the logs where the StateTransitionManager is transitioning into Stabilizing phase with "Desired resources are not met" log message indicating that there are not enough resources available for immediate state transitioning. But the 4 task slots become available earlier (indicated by the DefaultDeclarativeSlotPool with "Acquired new resources; new total acquired resources: ResourceCounter{resources={ResourceProfile {UNKNOWN}=4}}"){quote} I came up with a reasoning for this item. I reverted FLINK-36279 locally for the test run which means that only free slots (i.e. 2 rather than the 4 because 2 where already allocated in the Executing state) were considered for the desired resources check. That explains why that behavior. > AdaptiveSchedulerClusterITCase. testCheckpointStatsPersistedAcrossRescale > failed with > -------------------------------------------------------------------------------------- > > Key: FLINK-36295 > URL: https://issues.apache.org/jira/browse/FLINK-36295 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination > Affects Versions: 2.0-preview > Reporter: Matthias Pohl > Assignee: Matthias Pohl > Priority: Blocker > Labels: test-stability > Attachments: > FLINK-36295.failure.62156.20240916.1.logs-cron_jdk17-test_cron_jdk17_core-1726454552.log, > FLINK-36295.failure.with-revert.debug.log > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62156&view=logs&j=675bf62c-8558-587e-2555-dcad13acefb5&t=5878eed3-cc1e-5b12-1ed0-9e7139ce0992&l=10234 > {code} > Sep 16 03:06:30 03:06:30.168 [ERROR] Tests run: 3, Failures: 0, Errors: 1, > Skipped: 0, Time elapsed: 5.275 s <<< FAILURE! -- in > org.apache.flink.runtime.scheduler.adaptive.AdaptiveSchedulerClusterITCase > Sep 16 03:06:30 03:06:30.168 [ERROR] > org.apache.flink.runtime.scheduler.adaptive.AdaptiveSchedulerClusterITCase.testCheckpointStatsPersistedAcrossRescale > -- Time elapsed: 0.676 s <<< ERROR! > Sep 16 03:06:30 java.lang.IndexOutOfBoundsException: Index: -1 > Sep 16 03:06:30 at > java.base/java.util.Collections$EmptyList.get(Collections.java:4586) > Sep 16 03:06:30 at > org.apache.flink.runtime.scheduler.adaptive.AdaptiveSchedulerClusterITCase.testCheckpointStatsPersistedAcrossRescale(AdaptiveSchedulerClusterITCase.java:214) > Sep 16 03:06:30 at > java.base/java.lang.reflect.Method.invoke(Method.java:568) > Sep 16 03:06:30 at > java.base/java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:194) > Sep 16 03:06:30 at > java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373) > Sep 16 03:06:30 at > java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182) > Sep 16 03:06:30 at > java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655) > Sep 16 03:06:30 at > java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622) > Sep 16 03:06:30 at > java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165) > Sep 16 03:06:30 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)