[ https://issues.apache.org/jira/browse/FLINK-21644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17297798#comment-17297798 ]
Guowei Ma commented on FLINK-21644: ----------------------------------- another case https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=14233&view=logs&j=6caf31d6-847a-526e-9624-468e053467d6&t=7d4f7375-52df-5ce0-457f-b2ffbb2289a4 The cmd `stop-with-savepoint` failed because of timeout(60s). From the following log we could see that the completion of the checkpoint-4 is a litter slower(105s) than the timeout(60s). I do not find any clue in the taskexecutor's log. {code:java} 2021-03-06 21:49:29,239 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Completed checkpoint 4 for job 57d2077988c68f83c07f2d2e3a18f2de (415937 bytes in 105168 ms). 2021-03-06 21:49:29,296 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: EventSource -> Timestamps/Watermarks (4/4) (1d6bd49d343190991cf9564398c87e0c) switched from RUNNING to FINISHED. 2021-03-06 21:49:29,303 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: EventSource -> Timestamps/Watermarks (2/4) (3dbae2667f0aae5b219d5d973b5ee309) switched from RUNNING to FINISHED. 2021-03-06 21:49:29,304 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: EventSource -> Timestamps/Watermarks (1/4) (817466ba51b474f1c2d5919e70f21947) switched from RUNNING to FINISHED. 2021-03-06 21:49:29,304 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: EventSource -> Timestamps/Watermarks (3/4) (5462faf621624c136c6b84b7218f1b6d) switched from RUNNING to FINISHED. 2021-03-06 21:49:29,630 INFO org.apache.flink.metrics.slf4j.Slf4jReporter [] - {code} > Resuming Savepoint (rocks, scale up, heap timers) end-to-end test failed > ------------------------------------------------------------------------ > > Key: FLINK-21644 > URL: https://issues.apache.org/jira/browse/FLINK-21644 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing > Affects Versions: 1.11.3 > Reporter: Guowei Ma > Priority: Major > Labels: test-stability > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=14213&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=2b7514ee-e706-5046-657b-3430666e7bd9 > Due to the some operator exit slowly the test fail. > We can find that the "EventSource -> Timestamps/Watermarks" exit very quickly. > But the "ArtificalKeyedStateMapper_Kryo_and_Custom_Stateful" spend another > 34s to exits. > {code:java} > 2021-03-05 21:41:15,327 INFO org.apache.flink.runtime.taskmanager.Task > [] - Source: EventSource -> Timestamps/Watermarks (2/2) > (87b0121ed823482e9f5718e99793ee5c) switched from RUNNING to FINISHED. > 2021-03-05 21:41:15,327 INFO org.apache.flink.runtime.taskmanager.Task > [] - Freeing task resources for Source: EventSource -> > Timestamps/Watermarks (2/2) (87b0121ed823482e9f5718e99793ee5c). > 2021-03-05 21:41:15,332 INFO org.apache.flink.runtime.taskmanager.Task > [] - Source: EventSource -> Timestamps/Watermarks (1/2) > (4ede785fe0f1c4c798030ac748da0a95) switched from RUNNING to FINISHED. > 2021-03-05 21:41:15,332 INFO org.apache.flink.runtime.taskmanager.Task > [] - Freeing task resources for Source: EventSource -> > Timestamps/Watermarks (1/2) (4ede785fe0f1c4c798030ac748da0a95). > 2021-03-05 21:41:15,336 INFO > org.apache.flink.runtime.taskexecutor.TaskExecutor [] - > Un-registering task and sending final execution state FINISHED to JobManager > for task Source: EventSource -> Timestamps/Watermarks (2/2) > 87b0121ed823482e9f5718e99793ee5c. > 2021-03-05 21:41:15,338 INFO > org.apache.flink.runtime.taskexecutor.TaskExecutor [] - > Un-registering task and sending final execution state FINISHED to JobManager > for task Source: EventSource -> Timestamps/Watermarks (1/2) > 4ede785fe0f1c4c798030ac748da0a95. > 2021-03-05 21:41:16,294 INFO org.apache.flink.metrics.slf4j.Slf4jReporter > [] - > 2021-03-05 21:41:17,298 INFO org.apache.flink.metrics.slf4j.Slf4jReporter > [] - > 2021-03-05 21:41:18,301 INFO org.apache.flink.metrics.slf4j.Slf4jReporter > [] - > 2021-03-05 21:41:19,304 INFO org.apache.flink.metrics.slf4j.Slf4jReporter > [] - > 2021-03-05 21:41:20,307 INFO org.apache.flink.metrics.slf4j.Slf4jReporter > [] - > 2021-03-05 21:41:21,310 INFO org.apache.flink.metrics.slf4j.Slf4jReporter > [] - > 2021-03-05 21:41:22,313 INFO org.apache.flink.metrics.slf4j.Slf4jReporter > [] - > 2021-03-05 21:41:23,315 INFO org.apache.flink.metrics.slf4j.Slf4jReporter > [] - > 2021-03-05 21:41:24,318 INFO org.apache.flink.metrics.slf4j.Slf4jReporter > [] - > 2021-03-05 21:41:25,321 INFO org.apache.flink.metrics.slf4j.Slf4jReporter > [] - > 2021-03-05 21:41:26,324 INFO org.apache.flink.metrics.slf4j.Slf4jReporter > [] - > 2021-03-05 21:41:27,326 INFO org.apache.flink.metrics.slf4j.Slf4jReporter > [] - > 2021-03-05 21:41:28,329 INFO org.apache.flink.metrics.slf4j.Slf4jReporter > [] - > 2021-03-05 21:41:29,332 INFO org.apache.flink.metrics.slf4j.Slf4jReporter > [] - > 2021-03-05 21:41:30,335 INFO org.apache.flink.metrics.slf4j.Slf4jReporter > [] - > 2021-03-05 21:41:31,337 INFO org.apache.flink.metrics.slf4j.Slf4jReporter > [] - > 2021-03-05 21:41:32,340 INFO org.apache.flink.metrics.slf4j.Slf4jReporter > [] - > 2021-03-05 21:41:33,343 INFO org.apache.flink.metrics.slf4j.Slf4jReporter > [] - > 2021-03-05 21:41:34,345 INFO org.apache.flink.metrics.slf4j.Slf4jReporter > [] - > 2021-03-05 21:41:35,348 INFO org.apache.flink.metrics.slf4j.Slf4jReporter > [] - > 2021-03-05 21:41:36,351 INFO org.apache.flink.metrics.slf4j.Slf4jReporter > [] - > 2021-03-05 21:41:37,353 INFO org.apache.flink.metrics.slf4j.Slf4jReporter > [] - > 2021-03-05 21:41:38,356 INFO org.apache.flink.metrics.slf4j.Slf4jReporter > [] - > 2021-03-05 21:41:39,359 INFO org.apache.flink.metrics.slf4j.Slf4jReporter > [] - > 2021-03-05 21:41:40,362 INFO org.apache.flink.metrics.slf4j.Slf4jReporter > [] - > 2021-03-05 21:41:41,365 INFO org.apache.flink.metrics.slf4j.Slf4jReporter > [] - > 2021-03-05 21:41:42,368 INFO org.apache.flink.metrics.slf4j.Slf4jReporter > [] - > 2021-03-05 21:41:43,371 INFO org.apache.flink.metrics.slf4j.Slf4jReporter > [] - > 2021-03-05 21:41:44,373 INFO org.apache.flink.metrics.slf4j.Slf4jReporter > [] - > 2021-03-05 21:41:45,376 INFO org.apache.flink.metrics.slf4j.Slf4jReporter > [] - > 2021-03-05 21:41:46,379 INFO org.apache.flink.metrics.slf4j.Slf4jReporter > [] - > 2021-03-05 21:41:47,382 INFO org.apache.flink.metrics.slf4j.Slf4jReporter > [] - > 2021-03-05 21:41:48,385 INFO org.apache.flink.metrics.slf4j.Slf4jReporter > [] - > 2021-03-05 21:41:49,388 INFO org.apache.flink.metrics.slf4j.Slf4jReporter > [] - > 2021-03-05 21:41:49,601 INFO > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend [] - Closed > RocksDB State Backend. Cleaning up RocksDB working directory > /tmp/flink-io-83d8d11e-f3a4-433e-81f3-39075eca8c3d/job_7445670e103edd0eb216e93dd7ab9255_op_StreamMap_52 > 71c210329e73bd743f3227edfb3b71__1_2__uuid_3512e21a-4158-4d7e-bbd8-d0909db46626. > 2021-03-05 21:41:49,607 INFO org.apache.flink.runtime.taskmanager.Task > [] - ArtificalKeyedStateMapper_Kryo_and_Custom_Stateful (1/2) > (418bee0708b657e5ae1d876d6fb7320a) switched from RUNNING to FINISHED. > 2021-03-05 21:41:49,608 INFO org.apache.flink.runtime.taskmanager.Task > [] - Freeing task resources for > ArtificalKeyedStateMapper_Kryo_and_Custom_Stateful (1/2) > (418bee0708b657e5ae1d876d6fb7320a). > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)