[ https://issues.apache.org/jira/browse/FLINK-24161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yangze Guo updated FLINK-24161: ------------------------------- Description: When stop the job with savepoint, if there is a task is finishing, the action will be timeout. Testing job: https://github.com/KarmaGYZ/flink/blob/test-147/flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples/wordcount/WordCount.java Flink conf: {code:bash} state.savepoints.dir: /tmp/flink-savepoints state.backend: rocksdb state.backend.incremental: true state.checkpoints.dir: file:///tmp/flink-ckp/ execution.checkpointing.aligned-checkpoint-timeout: 30 s execution.checkpointing.interval: 5 s taskmanager.numberOfTaskSlots: 2 execution.checkpointing.checkpoints-after-tasks-finish.enabled: true {code} How to reproduce: {code:bash} bin/flink run -d -p 4 examples/streaming/WordCount.jar # while one task is finishing bin/flink stop $JOB_ID {code} Client log: {code:bash} ------------------------------------------------------------ The program finished with the following exception: org.apache.flink.util.FlinkException: Could not stop with a savepoint job "e139a2eba7f8dc0b07fab65e84421ee4". at org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:581) at org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1002) at org.apache.flink.client.cli.CliFrontend.stop(CliFrontend.java:569) at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1069) at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132) at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28) at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132) Caused by: java.util.concurrent.TimeoutException at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) at org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:579) ... 6 more {code} was: When stop the job with savepoint, if there is a task is finishing, the action will be timeout. Testing job: https://github.com/KarmaGYZ/flink/blob/test-147/flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples/wordcount/WordCount.java Flink conf: {code:bash} state.savepoints.dir: /tmp/flink-savepoints state.backend: rocksdb state.backend.incremental: true state.checkpoints.dir: file:///tmp/flink-ckp/ execution.checkpointing.aligned-checkpoint-timeout: 30 s execution.checkpointing.interval: 5 s taskmanager.numberOfTaskSlots: 2 {code} How to reproduce: {code:bash} bin/flink run -d -p 4 examples/streaming/WordCount.jar # while one task is finishing bin/flink stop $JOB_ID {code} Client log: {code:bash} ------------------------------------------------------------ The program finished with the following exception: org.apache.flink.util.FlinkException: Could not stop with a savepoint job "e139a2eba7f8dc0b07fab65e84421ee4". at org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:581) at org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1002) at org.apache.flink.client.cli.CliFrontend.stop(CliFrontend.java:569) at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1069) at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132) at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28) at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132) Caused by: java.util.concurrent.TimeoutException at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) at org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:579) ... 6 more {code} > Can not stop the job with savepoint while a task is finishing > ------------------------------------------------------------- > > Key: FLINK-24161 > URL: https://issues.apache.org/jira/browse/FLINK-24161 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing > Affects Versions: 1.14.0 > Reporter: Yangze Guo > Priority: Blocker > Fix For: 1.14.0 > > Attachments: > flink-yangze-standalonesession-0-IT-C02YV0L8LVDL.local.log, > flink-yangze-taskexecutor-0-IT-C02YV0L8LVDL.local.log, > flink-yangze-taskexecutor-1-IT-C02YV0L8LVDL.local.log > > > When stop the job with savepoint, if there is a task is finishing, the action > will be timeout. > Testing job: > https://github.com/KarmaGYZ/flink/blob/test-147/flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples/wordcount/WordCount.java > Flink conf: > {code:bash} > state.savepoints.dir: /tmp/flink-savepoints > state.backend: rocksdb > state.backend.incremental: true > state.checkpoints.dir: file:///tmp/flink-ckp/ > execution.checkpointing.aligned-checkpoint-timeout: 30 s > execution.checkpointing.interval: 5 s > taskmanager.numberOfTaskSlots: 2 > execution.checkpointing.checkpoints-after-tasks-finish.enabled: true > {code} > How to reproduce: > {code:bash} > bin/flink run -d -p 4 examples/streaming/WordCount.jar > # while one task is finishing > bin/flink stop $JOB_ID > {code} > Client log: > {code:bash} > ------------------------------------------------------------ > The program finished with the following exception: > org.apache.flink.util.FlinkException: Could not stop with a savepoint job > "e139a2eba7f8dc0b07fab65e84421ee4". > at > org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:581) > at > org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1002) > at org.apache.flink.client.cli.CliFrontend.stop(CliFrontend.java:569) > at > org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1069) > at > org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132) > at > org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132) > Caused by: java.util.concurrent.TimeoutException > at > java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) > at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) > at > org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:579) > ... 6 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)