Hi, AFAIK, the features discussed in the threads you mentioned are not yet implemented. So there is no way to avoid Job restarts in case of ZK rolling restarts. I'm pulling in Till as he might know better.
Regards, Roman On Fri, Oct 16, 2020 at 7:45 PM Kenzyme <k...@kenzymele.com> wrote: > Hi, > > Related to > https://mail-archives.apache.org/mod_mbox/flink-dev/201709.mbox/%3CCA+faj9yvPyzmmLoEWAMPgXDP6kx+0oed1Z5k4s3K9sgiCFyb=w...@mail.gmail.com%3E > and https://issues.apache.org/jira/browse/FLINK-10052, I was wondering if > there's a way to prevent Flink instances from failing while doing a rolling > restart on ZK followers while still keeping the quorum? > > This is what was shown in Flink logs while restarting ZK : > ZooKeeper connection SUSPENDING. Changes to the submitted job graphs are > not monitored (temporarily). > > I was able to reproduce this twice with a quorum of 5 ZK nodes while doing > some ZK maintenance. > > Thanks! > > Kenzyme Le > > >