[ https://issues.apache.org/jira/browse/FLINK-36279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881982#comment-17881982 ]
Matthias Pohl edited comment on FLINK-36279 at 9/16/24 9:36 AM: ---------------------------------------------------------------- The issue is caused by the alignment of desired and sufficient resources definition in FLINK-36014. The desired resources is still calculated based on the free slots (which makes sense for {{WaitingForResources}} state but not {{Executing}} (because slots are still allocated while the job is running). The open question is why this wasn't revealed by the {{RescaleOnCheckpointITCase}} within [FLINK-36014 PR|https://github.com/apache/flink/pull/25307] CI run. was (Author: mapohl): The issue is caused by the alignment of desired and sufficient resources definition in FLINK-36014. The desired resources is still calculated based on the free slots (which makes sense for {{WaitingForResources}} state but not {{Executing}} (because slots are still allocated while the job is running). > RescaleOnCheckpointITCase.testRescaleOnCheckpoint fails > ------------------------------------------------------- > > Key: FLINK-36279 > URL: https://issues.apache.org/jira/browse/FLINK-36279 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination > Affects Versions: 2.0-preview > Reporter: Matthias Pohl > Assignee: Matthias Pohl > Priority: Major > Labels: test-stability > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62105&view=logs&j=5c8e7682-d68f-54d1-16a2-a09310218a49&t=86f654fa-ab48-5c1a-25f4-7e7f6afb9bba&l=11287 > {code} > Sep 13 17:16:55 "ForkJoinPool-1-worker-25" #28 daemon prio=5 os_prio=0 > tid=0x00007f973f0c2800 nid=0x31a1 waiting on condition [0x00007f97089fc000] > Sep 13 17:16:55 java.lang.Thread.State: TIMED_WAITING (sleeping) > Sep 13 17:16:55 at java.lang.Thread.sleep(Native Method) > Sep 13 17:16:55 at > org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:152) > Sep 13 17:16:55 at > org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:145) > Sep 13 17:16:55 at > org.apache.flink.test.scheduling.UpdateJobResourceRequirementsITCase.waitForRunningTasks(UpdateJobResourceRequirementsITCase.java:219) > Sep 13 17:16:55 at > org.apache.flink.test.scheduling.RescaleOnCheckpointITCase.testRescaleOnCheckpoint(RescaleOnCheckpointITCase.java:139) > Sep 13 17:16:55 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > Sep 13 17:16:55 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > [...] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)