[ https://issues.apache.org/jira/browse/FLINK-36279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881996#comment-17881996 ]
Matthias Pohl commented on FLINK-36279: --------------------------------------- Actually, I was able to extract the logs for the successful run. I attached this one (FLINK-36279.20240914.6.success.log) and a successful local test run with the fix (FLINK-36279.fixed.success.log) to FLINK-36279. Unfortunately, we don't have the extensive logs which I just now added as part of the FLINK-36279 change. The logs show that the tasks reach FINISHED state shortly after the checkpoint was created: {code} [...] 10:26:48,534 [jobmanager-io-thread-6] INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Completed checkpoint 1 for job 0e4687109c92b1c63eee2c303d96f7c8 (1076 bytes, checkpointDuration=192 ms, finalizationTime=32 ms). 10:26:48,540 [SourceCoordinator-Source: Sequence Source] INFO org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Marking checkpoint 1 as completed for source Source: Sequence Source. 10:27:13,843 [SourceCoordinator-Source: Sequence Source] INFO org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Source Source: Sequence Source received split request from parallel task 0 (#0) 10:27:13,859 [Source: Sequence Source -> Sink: Writer (1/4)#0] INFO org.apache.flink.runtime.taskmanager.Task [] - Source: Sequence Source -> Sink: Writer (1/4)#0 (cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_0_0) switched from RUNNING to FINISHED. 10:27:13,860 [Source: Sequence Source -> Sink: Writer (1/4)#0] INFO org.apache.flink.runtime.taskmanager.Task [] - Freeing task resources for Source: Sequence Source -> Sink: Writer (1/4)#0 (cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_0_0). 10:27:13,867 [flink-pekko.actor.default-dispatcher-8] INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Un-registering task and sending final execution state FINISHED to JobManager for task Source: Sequence Source -> Sink: Writer (1/4)#0 cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_0_0. 10:27:13,874 [flink-pekko.actor.default-dispatcher-8] INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: Sequence Source -> Sink: Writer (1/4) (cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_0_0) switched from RUNNING to FINISHED. 10:27:13,974 [flink-pekko.actor.default-dispatcher-9] INFO org.apache.flink.runtime.jobmaster.slotpool.DefaultDeclarativeSlotPool [] - Releasing slot [f2402b13ebfae6efeca75db27e501343]. 10:27:13,974 [flink-pekko.actor.default-dispatcher-5] INFO org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl [] - Free slot TaskSlot(index:5, state:ACTIVE, resource profile: ResourceProfile{taskHeapMemory=256.000gb (274877906944 bytes), taskOffHeapMemory=256.000gb (274877906944 bytes), managedMemory=20.000mb (20971520 bytes), networkMemory=16.000mb (16777216 bytes)}, allocationId: f2402b13ebfae6efeca75db27e501343, jobId: 0e4687109c92b1c63eee2c303d96f7c8). 10:27:13,977 [flink-pekko.actor.default-dispatcher-9] INFO org.apache.flink.runtime.resourcemanager.slotmanager.DefaultSlotStatusSyncer [] - Freeing slot f2402b13ebfae6efeca75db27e501343. 10:27:14,026 [SourceCoordinator-Source: Sequence Source] INFO org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Source Source: Sequence Source received split request from parallel task 3 (#0) 10:27:14,030 [Source: Sequence Source -> Sink: Writer (4/4)#0] INFO org.apache.flink.runtime.taskmanager.Task [] - Source: Sequence Source -> Sink: Writer (4/4)#0 (cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_3_0) switched from RUNNING to FINISHED. 10:27:14,031 [Source: Sequence Source -> Sink: Writer (4/4)#0] INFO org.apache.flink.runtime.taskmanager.Task [] - Freeing task resources for Source: Sequence Source -> Sink: Writer (4/4)#0 (cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_3_0). 10:27:14,033 [flink-pekko.actor.default-dispatcher-8] INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Un-registering task and sending final execution state FINISHED to JobManager for task Source: Sequence Source -> Sink: Writer (4/4)#0 cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_3_0. 10:27:14,041 [flink-pekko.actor.default-dispatcher-8] INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: Sequence Source -> Sink: Writer (4/4) (cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_3_0) switched from RUNNING to FINISHED. 10:27:14,055 [flink-pekko.actor.default-dispatcher-5] INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Job Flink Streaming Job (0e4687109c92b1c63eee2c303d96f7c8) switched from state RUNNING to CANCELLING. 10:27:14,056 [flink-pekko.actor.default-dispatcher-5] INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Discarding the results produced by task execution cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_0_0. 10:27:14,059 [flink-pekko.actor.default-dispatcher-5] INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: Sequence Source -> Sink: Writer (2/4) (cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_1_0) switched from RUNNING to CANCELING. 10:27:14,061 [flink-pekko.actor.default-dispatcher-8] INFO org.apache.flink.runtime.taskmanager.Task [] - Attempting to cancel task Source: Sequence Source -> Sink: Writer (2/4)#0 (cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_1_0). 10:27:14,063 [flink-pekko.actor.default-dispatcher-8] INFO org.apache.flink.runtime.taskmanager.Task [] - Source: Sequence Source -> Sink: Writer (2/4)#0 (cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_1_0) switched from RUNNING to CANCELING. 10:27:14,064 [flink-pekko.actor.default-dispatcher-5] INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: Sequence Source -> Sink: Writer (3/4) (cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_2_0) switched from RUNNING to CANCELING. 10:27:14,067 [flink-pekko.actor.default-dispatcher-8] INFO org.apache.flink.runtime.taskmanager.Task [] - Triggering cancellation of task code Source: Sequence Source -> Sink: Writer (2/4)#0 (cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_1_0). 10:27:14,069 [flink-pekko.actor.default-dispatcher-5] INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Discarding the results produced by task execution cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_3_0. 10:27:14,078 [Source: Sequence Source -> Sink: Writer (2/4)#0] INFO org.apache.flink.runtime.taskmanager.Task [] - Source: Sequence Source -> Sink: Writer (2/4)#0 (cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_1_0) switched from CANCELING to CANCELED. 10:27:14,078 [flink-pekko.actor.default-dispatcher-8] INFO org.apache.flink.runtime.taskmanager.Task [] - Attempting to cancel task Source: Sequence Source -> Sink: Writer (3/4)#0 (cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_2_0). 10:27:14,081 [flink-pekko.actor.default-dispatcher-8] INFO org.apache.flink.runtime.taskmanager.Task [] - Source: Sequence Source -> Sink: Writer (3/4)#0 (cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_2_0) switched from RUNNING to CANCELING. 10:27:14,078 [Source: Sequence Source -> Sink: Writer (2/4)#0] INFO org.apache.flink.runtime.taskmanager.Task [] - Freeing task resources for Source: Sequence Source -> Sink: Writer (2/4)#0 (cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_1_0). 10:27:14,081 [flink-pekko.actor.default-dispatcher-8] INFO org.apache.flink.runtime.taskmanager.Task [] - Triggering cancellation of task code Source: Sequence Source -> Sink: Writer (3/4)#0 (cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_2_0). 10:27:14,091 [Source: Sequence Source -> Sink: Writer (3/4)#0] INFO org.apache.flink.runtime.taskmanager.Task [] - Source: Sequence Source -> Sink: Writer (3/4)#0 (cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_2_0) switched from CANCELING to CANCELED. 10:27:14,091 [Source: Sequence Source -> Sink: Writer (3/4)#0] INFO org.apache.flink.runtime.taskmanager.Task [] - Freeing task resources for Source: Sequence Source -> Sink: Writer (3/4)#0 (cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_2_0). 10:27:14,092 [flink-pekko.actor.default-dispatcher-8] INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Un-registering task and sending final execution state CANCELED to JobManager for task Source: Sequence Source -> Sink: Writer (2/4)#0 cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_1_0. 10:27:14,093 [flink-pekko.actor.default-dispatcher-9] INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: Sequence Source -> Sink: Writer (2/4) (cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_1_0) switched from CANCELING to CANCELED. 10:27:14,095 [flink-pekko.actor.default-dispatcher-8] INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Un-registering task and sending final execution state CANCELED to JobManager for task Source: Sequence Source -> Sink: Writer (3/4)#0 cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_2_0. 10:27:14,096 [flink-pekko.actor.default-dispatcher-9] INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: Sequence Source -> Sink: Writer (3/4) (cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_2_0) switched from CANCELING to CANCELED. 10:27:14,098 [flink-pekko.actor.default-dispatcher-9] INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Job Flink Streaming Job (0e4687109c92b1c63eee2c303d96f7c8) switched from state CANCELLING to CANCELED. 10:27:14,099 [flink-pekko.actor.default-dispatcher-9] INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Stopping checkpoint coordinator for job 0e4687109c92b1c63eee2c303d96f7c8. 10:27:14,108 [ Thread-9] INFO org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Closing SourceCoordinator for source Source: Sequence Source. 10:27:14,108 [ Thread-9] INFO org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Source coordinator for source Source: Sequence Source closed. {code} > RescaleOnCheckpointITCase.testRescaleOnCheckpoint fails > ------------------------------------------------------- > > Key: FLINK-36279 > URL: https://issues.apache.org/jira/browse/FLINK-36279 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination > Affects Versions: 2.0-preview > Reporter: Matthias Pohl > Assignee: Matthias Pohl > Priority: Major > Labels: pull-request-available, test-stability > Attachments: FLINK-36279.20240914.6.success.log, > FLINK-36279.fixed.success.log > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62105&view=logs&j=5c8e7682-d68f-54d1-16a2-a09310218a49&t=86f654fa-ab48-5c1a-25f4-7e7f6afb9bba&l=11287 > {code} > Sep 13 17:16:55 "ForkJoinPool-1-worker-25" #28 daemon prio=5 os_prio=0 > tid=0x00007f973f0c2800 nid=0x31a1 waiting on condition [0x00007f97089fc000] > Sep 13 17:16:55 java.lang.Thread.State: TIMED_WAITING (sleeping) > Sep 13 17:16:55 at java.lang.Thread.sleep(Native Method) > Sep 13 17:16:55 at > org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:152) > Sep 13 17:16:55 at > org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:145) > Sep 13 17:16:55 at > org.apache.flink.test.scheduling.UpdateJobResourceRequirementsITCase.waitForRunningTasks(UpdateJobResourceRequirementsITCase.java:219) > Sep 13 17:16:55 at > org.apache.flink.test.scheduling.RescaleOnCheckpointITCase.testRescaleOnCheckpoint(RescaleOnCheckpointITCase.java:139) > Sep 13 17:16:55 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > Sep 13 17:16:55 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > [...] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)