Hello everyone,

We recently upgrade FLINK from 1.9.1 to 1.11.0. Found one strange behavior when 
we stop a job to a save point got following time out error.
I checked Flink web console, the save point is created in s3 in 1 second.The 
job is fairly simple, so 1 second for savepoint generation is expected. We use 
kubernetes deployment. I clocked it, it’s about 60 seconds when it returns this 
error. So afterwards, the job is hanging (it still says running, but actually 
not doing anything). I need run another command to cancel it. Anyone has idea 
what’s going on here? BTW, “flink stop works” in 1.19.1 for us before



flink@flink-jobmanager-fcf5d84c5-sz4wk:~$ flink stop 
88d9b46f59d131428e2a18c9c7b3aa3f
Suspending job "88d9b46f59d131428e2a18c9c7b3aa3f" with a savepoint.

------------------------------------------------------------
 The program finished with the following exception:

org.apache.flink.util.FlinkException: Could not stop with a savepoint job 
"88d9b46f59d131428e2a18c9c7b3aa3f".
        at 
org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:495)
        at 
org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:864)
        at org.apache.flink.client.cli.CliFrontend.stop(CliFrontend.java:487)
        at 
org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:931)
        at 
org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:992)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
        at 
org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
        at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:992)
Caused by: java.util.concurrent.TimeoutException
        at 
java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784)
        at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
        at 
org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:493)
        ... 9 more
flink@flink-jobmanager-fcf5d84c5-sz4wk:~$ 


Thanks in advance,
Ivan

Reply via email to