[ 
https://issues.apache.org/jira/browse/FLINK-36279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881996#comment-17881996
 ] 

Matthias Pohl commented on FLINK-36279:
---------------------------------------

Actually, I was able to extract the logs for the successful run. I attached 
this one (FLINK-36279.20240914.6.success.log) and a successful local test run 
with the fix (FLINK-36279.fixed.success.log) to FLINK-36279.

Unfortunately, we don't have the extensive logs which I just now added as part 
of the FLINK-36279 change. The logs show that the tasks reach FINISHED state 
shortly after the checkpoint was created:
{code}
[...]
10:26:48,534 [jobmanager-io-thread-6] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Completed 
checkpoint 1 for job 0e4687109c92b1c63eee2c303d96f7c8 (1076 bytes, 
checkpointDuration=192 ms, finalizationTime=32 ms).
10:26:48,540 [SourceCoordinator-Source: Sequence Source] INFO  
org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Marking 
checkpoint 1 as completed for source Source: Sequence Source.
10:27:13,843 [SourceCoordinator-Source: Sequence Source] INFO  
org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Source 
Source: Sequence Source received split request from parallel task 0 (#0)
10:27:13,859 [Source: Sequence Source -> Sink: Writer (1/4)#0] INFO  
org.apache.flink.runtime.taskmanager.Task                    [] - Source: 
Sequence Source -> Sink: Writer (1/4)#0 
(cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_0_0) 
switched from RUNNING to FINISHED.
10:27:13,860 [Source: Sequence Source -> Sink: Writer (1/4)#0] INFO  
org.apache.flink.runtime.taskmanager.Task                    [] - Freeing task 
resources for Source: Sequence Source -> Sink: Writer (1/4)#0 
(cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_0_0).
10:27:13,867 [flink-pekko.actor.default-dispatcher-8] INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - 
Un-registering task and sending final execution state FINISHED to JobManager 
for task Source: Sequence Source -> Sink: Writer (1/4)#0 
cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_0_0.
10:27:13,874 [flink-pekko.actor.default-dispatcher-8] INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: 
Sequence Source -> Sink: Writer (1/4) 
(cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_0_0) 
switched from RUNNING to FINISHED.
10:27:13,974 [flink-pekko.actor.default-dispatcher-9] INFO  
org.apache.flink.runtime.jobmaster.slotpool.DefaultDeclarativeSlotPool [] - 
Releasing slot [f2402b13ebfae6efeca75db27e501343].
10:27:13,974 [flink-pekko.actor.default-dispatcher-5] INFO  
org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl [] - Free slot 
TaskSlot(index:5, state:ACTIVE, resource profile: 
ResourceProfile{taskHeapMemory=256.000gb (274877906944 bytes), 
taskOffHeapMemory=256.000gb (274877906944 bytes), managedMemory=20.000mb 
(20971520 bytes), networkMemory=16.000mb (16777216 bytes)}, allocationId: 
f2402b13ebfae6efeca75db27e501343, jobId: 0e4687109c92b1c63eee2c303d96f7c8).
10:27:13,977 [flink-pekko.actor.default-dispatcher-9] INFO  
org.apache.flink.runtime.resourcemanager.slotmanager.DefaultSlotStatusSyncer [] 
- Freeing slot f2402b13ebfae6efeca75db27e501343.
10:27:14,026 [SourceCoordinator-Source: Sequence Source] INFO  
org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Source 
Source: Sequence Source received split request from parallel task 3 (#0)
10:27:14,030 [Source: Sequence Source -> Sink: Writer (4/4)#0] INFO  
org.apache.flink.runtime.taskmanager.Task                    [] - Source: 
Sequence Source -> Sink: Writer (4/4)#0 
(cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_3_0) 
switched from RUNNING to FINISHED.
10:27:14,031 [Source: Sequence Source -> Sink: Writer (4/4)#0] INFO  
org.apache.flink.runtime.taskmanager.Task                    [] - Freeing task 
resources for Source: Sequence Source -> Sink: Writer (4/4)#0 
(cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_3_0).
10:27:14,033 [flink-pekko.actor.default-dispatcher-8] INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - 
Un-registering task and sending final execution state FINISHED to JobManager 
for task Source: Sequence Source -> Sink: Writer (4/4)#0 
cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_3_0.
10:27:14,041 [flink-pekko.actor.default-dispatcher-8] INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: 
Sequence Source -> Sink: Writer (4/4) 
(cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_3_0) 
switched from RUNNING to FINISHED.
10:27:14,055 [flink-pekko.actor.default-dispatcher-5] INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Job Flink 
Streaming Job (0e4687109c92b1c63eee2c303d96f7c8) switched from state RUNNING to 
CANCELLING.
10:27:14,056 [flink-pekko.actor.default-dispatcher-5] INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Discarding 
the results produced by task execution 
cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_0_0.
10:27:14,059 [flink-pekko.actor.default-dispatcher-5] INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: 
Sequence Source -> Sink: Writer (2/4) 
(cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_1_0) 
switched from RUNNING to CANCELING.
10:27:14,061 [flink-pekko.actor.default-dispatcher-8] INFO  
org.apache.flink.runtime.taskmanager.Task                    [] - Attempting to 
cancel task Source: Sequence Source -> Sink: Writer (2/4)#0 
(cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_1_0).
10:27:14,063 [flink-pekko.actor.default-dispatcher-8] INFO  
org.apache.flink.runtime.taskmanager.Task                    [] - Source: 
Sequence Source -> Sink: Writer (2/4)#0 
(cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_1_0) 
switched from RUNNING to CANCELING.
10:27:14,064 [flink-pekko.actor.default-dispatcher-5] INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: 
Sequence Source -> Sink: Writer (3/4) 
(cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_2_0) 
switched from RUNNING to CANCELING.
10:27:14,067 [flink-pekko.actor.default-dispatcher-8] INFO  
org.apache.flink.runtime.taskmanager.Task                    [] - Triggering 
cancellation of task code Source: Sequence Source -> Sink: Writer (2/4)#0 
(cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_1_0).
10:27:14,069 [flink-pekko.actor.default-dispatcher-5] INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Discarding 
the results produced by task execution 
cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_3_0.
10:27:14,078 [Source: Sequence Source -> Sink: Writer (2/4)#0] INFO  
org.apache.flink.runtime.taskmanager.Task                    [] - Source: 
Sequence Source -> Sink: Writer (2/4)#0 
(cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_1_0) 
switched from CANCELING to CANCELED.
10:27:14,078 [flink-pekko.actor.default-dispatcher-8] INFO  
org.apache.flink.runtime.taskmanager.Task                    [] - Attempting to 
cancel task Source: Sequence Source -> Sink: Writer (3/4)#0 
(cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_2_0).
10:27:14,081 [flink-pekko.actor.default-dispatcher-8] INFO  
org.apache.flink.runtime.taskmanager.Task                    [] - Source: 
Sequence Source -> Sink: Writer (3/4)#0 
(cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_2_0) 
switched from RUNNING to CANCELING.
10:27:14,078 [Source: Sequence Source -> Sink: Writer (2/4)#0] INFO  
org.apache.flink.runtime.taskmanager.Task                    [] - Freeing task 
resources for Source: Sequence Source -> Sink: Writer (2/4)#0 
(cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_1_0).
10:27:14,081 [flink-pekko.actor.default-dispatcher-8] INFO  
org.apache.flink.runtime.taskmanager.Task                    [] - Triggering 
cancellation of task code Source: Sequence Source -> Sink: Writer (3/4)#0 
(cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_2_0).
10:27:14,091 [Source: Sequence Source -> Sink: Writer (3/4)#0] INFO  
org.apache.flink.runtime.taskmanager.Task                    [] - Source: 
Sequence Source -> Sink: Writer (3/4)#0 
(cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_2_0) 
switched from CANCELING to CANCELED.
10:27:14,091 [Source: Sequence Source -> Sink: Writer (3/4)#0] INFO  
org.apache.flink.runtime.taskmanager.Task                    [] - Freeing task 
resources for Source: Sequence Source -> Sink: Writer (3/4)#0 
(cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_2_0).
10:27:14,092 [flink-pekko.actor.default-dispatcher-8] INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - 
Un-registering task and sending final execution state CANCELED to JobManager 
for task Source: Sequence Source -> Sink: Writer (2/4)#0 
cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_1_0.
10:27:14,093 [flink-pekko.actor.default-dispatcher-9] INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: 
Sequence Source -> Sink: Writer (2/4) 
(cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_1_0) 
switched from CANCELING to CANCELED.
10:27:14,095 [flink-pekko.actor.default-dispatcher-8] INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - 
Un-registering task and sending final execution state CANCELED to JobManager 
for task Source: Sequence Source -> Sink: Writer (3/4)#0 
cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_2_0.
10:27:14,096 [flink-pekko.actor.default-dispatcher-9] INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: 
Sequence Source -> Sink: Writer (3/4) 
(cdc6925e3f5acf2de95f2a5a813f07c7_cbc357ccb763df2852fee8c4fc7d55f2_2_0) 
switched from CANCELING to CANCELED.
10:27:14,098 [flink-pekko.actor.default-dispatcher-9] INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Job Flink 
Streaming Job (0e4687109c92b1c63eee2c303d96f7c8) switched from state CANCELLING 
to CANCELED.
10:27:14,099 [flink-pekko.actor.default-dispatcher-9] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Stopping 
checkpoint coordinator for job 0e4687109c92b1c63eee2c303d96f7c8.
10:27:14,108 [            Thread-9] INFO  
org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Closing 
SourceCoordinator for source Source: Sequence Source.
10:27:14,108 [            Thread-9] INFO  
org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Source 
coordinator for source Source: Sequence Source closed.
{code}

> RescaleOnCheckpointITCase.testRescaleOnCheckpoint fails
> -------------------------------------------------------
>
>                 Key: FLINK-36279
>                 URL: https://issues.apache.org/jira/browse/FLINK-36279
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 2.0-preview
>            Reporter: Matthias Pohl
>            Assignee: Matthias Pohl
>            Priority: Major
>              Labels: pull-request-available, test-stability
>         Attachments: FLINK-36279.20240914.6.success.log, 
> FLINK-36279.fixed.success.log
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62105&view=logs&j=5c8e7682-d68f-54d1-16a2-a09310218a49&t=86f654fa-ab48-5c1a-25f4-7e7f6afb9bba&l=11287
> {code}
> Sep 13 17:16:55 "ForkJoinPool-1-worker-25" #28 daemon prio=5 os_prio=0 
> tid=0x00007f973f0c2800 nid=0x31a1 waiting on condition [0x00007f97089fc000]
> Sep 13 17:16:55    java.lang.Thread.State: TIMED_WAITING (sleeping)
> Sep 13 17:16:55       at java.lang.Thread.sleep(Native Method)
> Sep 13 17:16:55       at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:152)
> Sep 13 17:16:55       at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:145)
> Sep 13 17:16:55       at 
> org.apache.flink.test.scheduling.UpdateJobResourceRequirementsITCase.waitForRunningTasks(UpdateJobResourceRequirementsITCase.java:219)
> Sep 13 17:16:55       at 
> org.apache.flink.test.scheduling.RescaleOnCheckpointITCase.testRescaleOnCheckpoint(RescaleOnCheckpointITCase.java:139)
> Sep 13 17:16:55       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> Sep 13 17:16:55       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to