[ https://issues.apache.org/jira/browse/FLINK-36279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matthias Pohl updated FLINK-36279: ---------------------------------- Priority: Blocker (was: Major) > AdaptiveScheduler#hasDesiredResources doesn't rely on all available slots > ------------------------------------------------------------------------- > > Key: FLINK-36279 > URL: https://issues.apache.org/jira/browse/FLINK-36279 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination > Affects Versions: 2.0-preview > Reporter: Matthias Pohl > Assignee: Matthias Pohl > Priority: Blocker > Labels: pull-request-available > Attachments: FLINK-36279-FLINK-36014-pr.success.log, > FLINK-36279.20240914.6.success.log, FLINK-36279.fixed.success.log > > > FLINK-36014 aligned the triggering of the execution graph creation in > {{WaitingForResources}} and rescaling in {{Executing}} state. Before that > change, only {{WaitingForResources}} relied on this method. Relying on free > slots was good enough because in {{WaitingForResources}} state, there are no > slots allocated, yet. > Using this method for {{Executing}} state now as well changes this premise > because there are slots allocated while checking the slot availability that > would become available after the restart. Hence, considering these currently > allocated slots as well in the slot availability check is good enough. This > will not break the premise for the {{WaitingForResources}} state. > {{RescaleOnCheckpointITCase}} fails because of that issue: > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62105&view=logs&j=5c8e7682-d68f-54d1-16a2-a09310218a49&t=86f654fa-ab48-5c1a-25f4-7e7f6afb9bba&l=11287 > {code} > Sep 13 17:16:55 "ForkJoinPool-1-worker-25" #28 daemon prio=5 os_prio=0 > tid=0x00007f973f0c2800 nid=0x31a1 waiting on condition [0x00007f97089fc000] > Sep 13 17:16:55 java.lang.Thread.State: TIMED_WAITING (sleeping) > Sep 13 17:16:55 at java.lang.Thread.sleep(Native Method) > Sep 13 17:16:55 at > org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:152) > Sep 13 17:16:55 at > org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:145) > Sep 13 17:16:55 at > org.apache.flink.test.scheduling.UpdateJobResourceRequirementsITCase.waitForRunningTasks(UpdateJobResourceRequirementsITCase.java:219) > Sep 13 17:16:55 at > org.apache.flink.test.scheduling.RescaleOnCheckpointITCase.testRescaleOnCheckpoint(RescaleOnCheckpointITCase.java:139) > Sep 13 17:16:55 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > Sep 13 17:16:55 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > [...] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)