[ 
https://issues.apache.org/jira/browse/FLINK-36279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881982#comment-17881982
 ] 

Matthias Pohl edited comment on FLINK-36279 at 9/16/24 9:37 AM:
----------------------------------------------------------------

The issue is caused by the alignment of desired and sufficient resources 
definition in FLINK-36014. The desired resources is still calculated based on 
the free slots (which makes sense for {{WaitingForResources}} state but not 
{{Executing}} (because slots are still allocated while the job is running).

The open question is why this wasn't revealed by the 
{{RescaleOnCheckpointITCase}} within [FLINK-36014 
PR|https://github.com/apache/flink/pull/25307] CI run. This seems to be a 
consistent issue for this test.


was (Author: mapohl):
The issue is caused by the alignment of desired and sufficient resources 
definition in FLINK-36014. The desired resources is still calculated based on 
the free slots (which makes sense for {{WaitingForResources}} state but not 
{{Executing}} (because slots are still allocated while the job is running).

The open question is why this wasn't revealed by the 
{{RescaleOnCheckpointITCase}} within [FLINK-36014 
PR|https://github.com/apache/flink/pull/25307] CI run.

> RescaleOnCheckpointITCase.testRescaleOnCheckpoint fails
> -------------------------------------------------------
>
>                 Key: FLINK-36279
>                 URL: https://issues.apache.org/jira/browse/FLINK-36279
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 2.0-preview
>            Reporter: Matthias Pohl
>            Assignee: Matthias Pohl
>            Priority: Major
>              Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62105&view=logs&j=5c8e7682-d68f-54d1-16a2-a09310218a49&t=86f654fa-ab48-5c1a-25f4-7e7f6afb9bba&l=11287
> {code}
> Sep 13 17:16:55 "ForkJoinPool-1-worker-25" #28 daemon prio=5 os_prio=0 
> tid=0x00007f973f0c2800 nid=0x31a1 waiting on condition [0x00007f97089fc000]
> Sep 13 17:16:55    java.lang.Thread.State: TIMED_WAITING (sleeping)
> Sep 13 17:16:55       at java.lang.Thread.sleep(Native Method)
> Sep 13 17:16:55       at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:152)
> Sep 13 17:16:55       at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:145)
> Sep 13 17:16:55       at 
> org.apache.flink.test.scheduling.UpdateJobResourceRequirementsITCase.waitForRunningTasks(UpdateJobResourceRequirementsITCase.java:219)
> Sep 13 17:16:55       at 
> org.apache.flink.test.scheduling.RescaleOnCheckpointITCase.testRescaleOnCheckpoint(RescaleOnCheckpointITCase.java:139)
> Sep 13 17:16:55       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> Sep 13 17:16:55       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to