Hello everyone, We recently upgrade FLINK from 1.9.1 to 1.11.0. Found one strange behavior when we stop a job to a save point got following time out error. I checked Flink web console, the save point is created in s3 in 1 second.The job is fairly simple, so 1 second for savepoint generation is expected. We use kubernetes deployment. I clocked it, it’s about 60 seconds when it returns this error. So afterwards, the job is hanging (it still says running, but actually not doing anything). I need run another command to cancel it. Anyone has idea what’s going on here? BTW, “flink stop works” in 1.19.1 for us before
flink@flink-jobmanager-fcf5d84c5-sz4wk:~$ flink stop 88d9b46f59d131428e2a18c9c7b3aa3f Suspending job "88d9b46f59d131428e2a18c9c7b3aa3f" with a savepoint. ------------------------------------------------------------ The program finished with the following exception: org.apache.flink.util.FlinkException: Could not stop with a savepoint job "88d9b46f59d131428e2a18c9c7b3aa3f". at org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:495) at org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:864) at org.apache.flink.client.cli.CliFrontend.stop(CliFrontend.java:487) at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:931) at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:992) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:992) Caused by: java.util.concurrent.TimeoutException at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) at org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:493) ... 9 more flink@flink-jobmanager-fcf5d84c5-sz4wk:~$ Thanks in advance, Ivan