Hi Jun, Thanks for the reply.
1) The reason I'm asking about it is I wonder if it's not worth focusing the development efforts on taking ownership of the existing PR (https://github.com/apache/zookeeper/pull/150) to fix ZOOKEEPER-2184, rebase it and have it merged into the ZK codebase shortly. I feel this KIP might introduce a setting that could be deprecated shortly and confuse the end user a bit further with one more knob to turn. 3) I'm not sure if I fully understand, sorry for the beginner's question: if the default timeout is infinite, then it won't change anything to how Kafka works from today, does it? (unless I'm missing something sorry). If not set to infinite, then we introduce the risk of a whole cluster shutting down at once? Thanks, Stephane On 31/10/17, 1:00 pm, "Jun Rao" <j...@confluent.io> wrote: Hi, Stephane, Thanks for the reply. 1) Fixing the issue in ZK will be ideal. Not sure when it will happen though. Once it's fixed, we can probably deprecate this config. 2) That could be useful. Is there a java api to do that at runtime? Also, invalidating DNS cache doesn't always fix the issue of unresolved host. In some of the cases, human intervention is needed. 3) The default timeout is infinite though. Jun On Sat, Oct 28, 2017 at 11:48 PM, Stephane Maarek < steph...@simplemachines.com.au> wrote: > Hi Jun, > > I think this is very helpful. Restarting Kafka brokers in case of zookeeper > host change is not a well known operation. > > Few questions: > 1) would it not be worth fixing the problem at the source ? This has been > stuck for a while though, maybe a little push would help : > https://issues.apache.org/jira/plugins/servlet/mobile#issue/ZOOKEEPER-2184 > > 2) upon recreating the zookeeper object , is it not possible to invalidate > the DNS cache so that it resolves the new hostname? > > 3) could the cluster be down in this situation: one migrates an entire > zookeeper cluster to new machines (one by one). The quorum is still alive > without downtime, but now every broker in a cluster can't resolve zookeeper > at the same time. They all shut down at the same time after the new > time-out setting. > > Thanks ! > Stéphane > > On 28 Oct. 2017 9:42 am, "Jun Rao" <j...@confluent.io> wrote: > > > Hi, Everyone, > > > > We created "KIP-217: Expose a timeout to allow an expired ZK session to > be > > re-created". > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > 217%3A+Expose+a+timeout+to+allow+an+expired+ZK+session+to+be+re-created > > > > Please take a look and provide your feedback. > > > > Thanks, > > > > Jun > > >