Today, I kept on receiving a timeout exception when stopping my job with a savepoint. This happened with Flink version 1.12.2 running in EMR.
I had to use the deprecated cancel with savepoint feature instead. In fact, stopping with a savepoint, creating a savepoint, and cancelling with a savepoint all gave me the timeout exception. However, the cancel with savepoint started creating a savepoint on the cluster. The program finished with the following exception: org.apache.flink.util.FlinkException: Could not stop with a savepoint job "5d6100984035db9541e9f08ecbd311bf". at org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:585) at org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1006) at org.apache.flink.client.cli.CliFrontend.stop(CliFrontend.java:573) at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1073) at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1136) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1136) Caused by: java.util.concurrent.TimeoutException at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) at org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:583) ... 9 more