Hi, Stephane,

3) The difference is that currently, there is no retry when re-creating the
Zookeeper object when a ZK session expires. So, if the re-creation of
Zookeeper fails, the broker just logs the error and the Zookeeper object
will never be created again. With this KIP, we will keep retrying the
creation of Zookeeper until success.

Thanks,

Jun

On Tue, Oct 31, 2017 at 3:28 PM, Stephane Maarek <
steph...@simplemachines.com.au> wrote:

> Hi Jun,
>
> Thanks for the reply.
>
> 1) The reason I'm asking about it is I wonder if it's not worth focusing
> the development efforts on taking ownership of the existing PR (
> https://github.com/apache/zookeeper/pull/150)  to fix ZOOKEEPER-2184,
> rebase it and have it merged into the ZK codebase shortly.  I feel this KIP
> might introduce a setting that could be deprecated shortly and confuse the
> end user a bit further with one more knob to turn.
>
> 3) I'm not sure if I fully understand, sorry for the beginner's question:
> if the default timeout is infinite, then it won't change anything to how
> Kafka works from today, does it? (unless I'm missing something sorry). If
> not set to infinite, then we introduce the risk of a whole cluster shutting
> down at once?
>
> Thanks,
> Stephane
>
> On 31/10/17, 1:00 pm, "Jun Rao" <j...@confluent.io> wrote:
>
>     Hi, Stephane,
>
>     Thanks for the reply.
>
>     1) Fixing the issue in ZK will be ideal. Not sure when it will happen
>     though. Once it's fixed, we can probably deprecate this config.
>
>     2) That could be useful. Is there a java api to do that at runtime?
> Also,
>     invalidating DNS cache doesn't always fix the issue of unresolved
> host. In
>     some of the cases, human intervention is needed.
>
>     3) The default timeout is infinite though.
>
>     Jun
>
>
>     On Sat, Oct 28, 2017 at 11:48 PM, Stephane Maarek <
>     steph...@simplemachines.com.au> wrote:
>
>     > Hi Jun,
>     >
>     > I think this is very helpful. Restarting Kafka brokers in case of
> zookeeper
>     > host change is not a well known operation.
>     >
>     > Few questions:
>     > 1) would it not be worth fixing the problem at the source ? This has
> been
>     > stuck for a while though, maybe a little push would help :
>     > https://issues.apache.org/jira/plugins/servlet/mobile#
> issue/ZOOKEEPER-2184
>     >
>     > 2) upon recreating the zookeeper object , is it not possible to
> invalidate
>     > the DNS cache so that it resolves the new hostname?
>     >
>     > 3) could the cluster be down in this situation: one migrates an
> entire
>     > zookeeper cluster to new machines (one by one). The quorum is still
> alive
>     > without downtime, but now every broker in a cluster can't resolve
> zookeeper
>     > at the same time. They all shut down at the same time after the new
>     > time-out setting.
>     >
>     > Thanks !
>     > Stéphane
>     >
>     > On 28 Oct. 2017 9:42 am, "Jun Rao" <j...@confluent.io> wrote:
>     >
>     > > Hi, Everyone,
>     > >
>     > > We created "KIP-217: Expose a timeout to allow an expired ZK
> session to
>     > be
>     > > re-created".
>     > >
>     > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>     > > 217%3A+Expose+a+timeout+to+allow+an+expired+ZK+session+
> to+be+re-created
>     > >
>     > > Please take a look and provide your feedback.
>     > >
>     > > Thanks,
>     > >
>     > > Jun
>     > >
>     >
>
>
>
>

Reply via email to