Re: [DISCUSS] KIP-217: Expose a timeout to allow an expired ZK session to be re-created

Stephane Maarek Wed, 01 Nov 2017 19:55:37 -0700

Thanks Jun for the clarification

It sounds like this kip is complementary to the zookeeper-2184 and can move
forward without it. We should still push hard for zookeeper-2184 to go
through (saw you commented on it earlier)


LGTM!

On 2 Nov. 2017 12:34 pm, "Jun Rao" <j...@confluent.io> wrote:

> Hi, Stephane,
>
> 3) The difference is that currently, there is no retry when re-creating the
> Zookeeper object when a ZK session expires. So, if the re-creation of
> Zookeeper fails, the broker just logs the error and the Zookeeper object
> will never be created again. With this KIP, we will keep retrying the
> creation of Zookeeper until success.
>
> Thanks,
>
> Jun
>
> On Tue, Oct 31, 2017 at 3:28 PM, Stephane Maarek <
> steph...@simplemachines.com.au> wrote:
>
> > Hi Jun,
> >
> > Thanks for the reply.
> >
> > 1) The reason I'm asking about it is I wonder if it's not worth focusing
> > the development efforts on taking ownership of the existing PR (
> > https://github.com/apache/zookeeper/pull/150)  to fix ZOOKEEPER-2184,
> > rebase it and have it merged into the ZK codebase shortly.  I feel this
> KIP
> > might introduce a setting that could be deprecated shortly and confuse
> the
> > end user a bit further with one more knob to turn.
> >
> > 3) I'm not sure if I fully understand, sorry for the beginner's question:
> > if the default timeout is infinite, then it won't change anything to how
> > Kafka works from today, does it? (unless I'm missing something sorry). If
> > not set to infinite, then we introduce the risk of a whole cluster
> shutting
> > down at once?
> >
> > Thanks,
> > Stephane
> >
> > On 31/10/17, 1:00 pm, "Jun Rao" <j...@confluent.io> wrote:
> >
> >     Hi, Stephane,
> >
> >     Thanks for the reply.
> >
> >     1) Fixing the issue in ZK will be ideal. Not sure when it will happen
> >     though. Once it's fixed, we can probably deprecate this config.
> >
> >     2) That could be useful. Is there a java api to do that at runtime?
> > Also,
> >     invalidating DNS cache doesn't always fix the issue of unresolved
> > host. In
> >     some of the cases, human intervention is needed.
> >
> >     3) The default timeout is infinite though.
> >
> >     Jun
> >
> >
> >     On Sat, Oct 28, 2017 at 11:48 PM, Stephane Maarek <
> >     steph...@simplemachines.com.au> wrote:
> >
> >     > Hi Jun,
> >     >
> >     > I think this is very helpful. Restarting Kafka brokers in case of
> > zookeeper
> >     > host change is not a well known operation.
> >     >
> >     > Few questions:
> >     > 1) would it not be worth fixing the problem at the source ? This
> has
> > been
> >     > stuck for a while though, maybe a little push would help :
> >     > https://issues.apache.org/jira/plugins/servlet/mobile#
> > issue/ZOOKEEPER-2184
> >     >
> >     > 2) upon recreating the zookeeper object , is it not possible to
> > invalidate
> >     > the DNS cache so that it resolves the new hostname?
> >     >
> >     > 3) could the cluster be down in this situation: one migrates an
> > entire
> >     > zookeeper cluster to new machines (one by one). The quorum is still
> > alive
> >     > without downtime, but now every broker in a cluster can't resolve
> > zookeeper
> >     > at the same time. They all shut down at the same time after the new
> >     > time-out setting.
> >     >
> >     > Thanks !
> >     > Stéphane
> >     >
> >     > On 28 Oct. 2017 9:42 am, "Jun Rao" <j...@confluent.io> wrote:
> >     >
> >     > > Hi, Everyone,
> >     > >
> >     > > We created "KIP-217: Expose a timeout to allow an expired ZK
> > session to
> >     > be
> >     > > re-created".
> >     > >
> >     > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >     > > 217%3A+Expose+a+timeout+to+allow+an+expired+ZK+session+
> > to+be+re-created
> >     > >
> >     > > Please take a look and provide your feedback.
> >     > >
> >     > > Thanks,
> >     > >
> >     > > Jun
> >     > >
> >     >
> >
> >
> >
> >
>

Re: [DISCUSS] KIP-217: Expose a timeout to allow an expired ZK session to be re-created

Reply via email to