Hi Roman, Thank you for your reply.
I'm not 100% sure if those features discussed in the threads will fix the issue, but they seemed related in some way. Basically, the expected behaviour I had for Flink was similar to how Kafka works i.e. Kafka services continues w/o disruption whenever ZK quorum is maintained during rolling updates. Best, Kenzyme Le ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Monday, October 19th, 2020 at 4:38 PM, Khachatryan Roman <khachatryan.ro...@gmail.com> wrote: > Hi, > > AFAIK, the features discussed in the threads you mentioned are not yet > implemented. So there is no way to avoid Job restarts in case of ZK rolling > restarts. > I'm pulling in Till as he might know better. > > Regards, > Roman > > On Fri, Oct 16, 2020 at 7:45 PM Kenzyme <k...@kenzymele.com> wrote: > >> Hi, >> >> Related to >> https://mail-archives.apache.org/mod_mbox/flink-dev/201709.mbox/%3CCA+faj9yvPyzmmLoEWAMPgXDP6kx+0oed1Z5k4s3K9sgiCFyb=w...@mail.gmail.com%3E >> and https://issues.apache.org/jira/browse/FLINK-10052, I was wondering if >> there's a way to prevent Flink instances from failing while doing a rolling >> restart on ZK followers while still keeping the quorum? >> >> This is what was shown in Flink logs while restarting ZK : >> ZooKeeper connection SUSPENDING. Changes to the submitted job graphs are not >> monitored (temporarily). >> >> I was able to reproduce this twice with a quorum of 5 ZK nodes while doing >> some ZK maintenance. >> >> Thanks! >> >> Kenzyme Le