Hi Ivan, thanks a lot for your message. Can you post the JobManager log here as well? It might contain additional information on the reason for the timeout.
On Fri, Jul 24, 2020 at 4:03 AM Ivan Yang <ivanygy...@gmail.com> wrote: > Hello everyone, > > We recently upgrade FLINK from 1.9.1 to 1.11.0. Found one strange behavior > when we stop a job to a save point got following time out error. > I checked Flink web console, the save point is created in s3 in 1 > second.The job is fairly simple, so 1 second for savepoint generation is > expected. We use kubernetes deployment. I clocked it, it’s about 60 seconds > when it returns this error. So afterwards, the job is hanging (it still > says running, but actually not doing anything). I need run another command > to cancel it. Anyone has idea what’s going on here? BTW, “flink stop works” > in 1.19.1 for us before > > > > flink@flink-jobmanager-fcf5d84c5-sz4wk:~$ flink stop > 88d9b46f59d131428e2a18c9c7b3aa3f > Suspending job "88d9b46f59d131428e2a18c9c7b3aa3f" with a savepoint. > > ------------------------------------------------------------ > The program finished with the following exception: > > org.apache.flink.util.FlinkException: Could not stop with a savepoint job > "88d9b46f59d131428e2a18c9c7b3aa3f". > at > org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:495) > at > org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:864) > at org.apache.flink.client.cli.CliFrontend.stop(CliFrontend.java:487) > at > org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:931) > at > org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:992) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:992) > Caused by: java.util.concurrent.TimeoutException > at > java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784) > at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) > at > org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:493) > ... 9 more > flink@flink-jobmanager-fcf5d84c5-sz4wk:~$ > > > Thanks in advance, > Ivan >