Hey guys, We just built a brand new Flink 1.4.0 cluster with HA and everything seems to be working fine, but we are getting some errors with savepoints.
For example, I have a running job ------------------ Running/Restarting Jobs ------------------- 25.07.2018 11:55:18 : e5280bad25a7f19122f98483f94aba26 : Mr Banks (RUNNING) -------------------------------------------------------------- If I try to create a savepoint with flink savepoint e5280bad25a7f19122f98483f94aba26 The command just stays there and never returns (I waited about 10 minutes, with no response). Then I tried to cancel with savepoint: flink cancel e5280bad25a7f19122f98483f94aba26 -s And I got a java.util.concurrent.TimeoutException: Futures timed out after [60000 milliseconds] I checked the jobmanager logs, but I can't see any problems; I checked the Hadoop logs for any errors (believing the problem may be in the underlying system), but it seems it did create the nodes properly -- at least, there are no errors there too. Is there anything else I should check? PS: My state is not that big (my napkin calculations say it's less than 1Gb) so it doesn't seem it's a problem with the state size taking too long to be saved. -- *Julio Biason*, Sofware Engineer *AZION* | Deliver. Accelerate. Protect. Office: +55 51 3083 8101 <callto:+555130838101> | Mobile: +55 51 <callto:+5551996209291>*99907 0554*