Re: [DISCUSS] KIP-217: Expose a timeout to allow an expired ZK session to be re-created

Stephane Maarek Thu, 02 Nov 2017 14:54:23 -0700

Hi Jun

I think this is a better option. Would that change require a kip then as
it's not a change in public API ?


@ted it was marked as a blocked for 3.4.11 but they pushed it. It seems
that the owner of the pr hasn't acted in over a year and I think someone
needs to take ownership of that. Additionally, this would be a change in
Kafka zookeeper client dependency, so no need to update your zookeeper
quorum to benefit from the change

Thanks
Stéphane


On 3 Nov. 2017 8:45 am, "Jun Rao" <[email protected]> wrote:

Stephane, Jeff,

Another option is to not expose the reconnect timeout config and just retry
the creation of Zookeeper forever. This is an improvement from the current
situation and if zookeeper-2184 is fixed in the future, we don't need to
deprecate the config.

Thanks,

Jun

On Thu, Nov 2, 2017 at 9:02 AM, Ted Yu <[email protected]> wrote:

> ZOOKEEPER-2184 is scheduled for 3.4.12 whose release is unknown.
>
> I think adding the session recreation on Kafka side should benefit Kafka
> users, especially those who don't plan to move to 3.4.12+ in the near
> future.
>
> On Wed, Nov 1, 2017 at 6:34 PM, Jun Rao <[email protected]> wrote:
>
> > Hi, Stephane,
> >
> > 3) The difference is that currently, there is no retry when re-creating
> the
> > Zookeeper object when a ZK session expires. So, if the re-creation of
> > Zookeeper fails, the broker just logs the error and the Zookeeper object
> > will never be created again. With this KIP, we will keep retrying the
> > creation of Zookeeper until success.
> >
> > Thanks,
> >
> > Jun
> >
> > On Tue, Oct 31, 2017 at 3:28 PM, Stephane Maarek <
> > [email protected]> wrote:
> >
> > > Hi Jun,
> > >
> > > Thanks for the reply.
> > >
> > > 1) The reason I'm asking about it is I wonder if it's not worth
> focusing
> > > the development efforts on taking ownership of the existing PR (
> > > https://github.com/apache/zookeeper/pull/150)  to fix ZOOKEEPER-2184,
> > > rebase it and have it merged into the ZK codebase shortly.  I feel
this
> > KIP
> > > might introduce a setting that could be deprecated shortly and confuse
> > the
> > > end user a bit further with one more knob to turn.
> > >
> > > 3) I'm not sure if I fully understand, sorry for the beginner's
> question:
> > > if the default timeout is infinite, then it won't change anything to
> how
> > > Kafka works from today, does it? (unless I'm missing something sorry).
> If
> > > not set to infinite, then we introduce the risk of a whole cluster
> > shutting
> > > down at once?
> > >
> > > Thanks,
> > > Stephane
> > >
> > > On 31/10/17, 1:00 pm, "Jun Rao" <[email protected]> wrote:
> > >
> > >     Hi, Stephane,
> > >
> > >     Thanks for the reply.
> > >
> > >     1) Fixing the issue in ZK will be ideal. Not sure when it will
> happen
> > >     though. Once it's fixed, we can probably deprecate this config.
> > >
> > >     2) That could be useful. Is there a java api to do that at
runtime?
> > > Also,
> > >     invalidating DNS cache doesn't always fix the issue of unresolved
> > > host. In
> > >     some of the cases, human intervention is needed.
> > >
> > >     3) The default timeout is infinite though.
> > >
> > >     Jun
> > >
> > >
> > >     On Sat, Oct 28, 2017 at 11:48 PM, Stephane Maarek <
> > >     [email protected]> wrote:
> > >
> > >     > Hi Jun,
> > >     >
> > >     > I think this is very helpful. Restarting Kafka brokers in case
of
> > > zookeeper
> > >     > host change is not a well known operation.
> > >     >
> > >     > Few questions:
> > >     > 1) would it not be worth fixing the problem at the source ? This
> > has
> > > been
> > >     > stuck for a while though, maybe a little push would help :
> > >     > https://issues.apache.org/jira/plugins/servlet/mobile#
> > > issue/ZOOKEEPER-2184
> > >     >
> > >     > 2) upon recreating the zookeeper object , is it not possible to
> > > invalidate
> > >     > the DNS cache so that it resolves the new hostname?
> > >     >
> > >     > 3) could the cluster be down in this situation: one migrates an
> > > entire
> > >     > zookeeper cluster to new machines (one by one). The quorum is
> still
> > > alive
> > >     > without downtime, but now every broker in a cluster can't
resolve
> > > zookeeper
> > >     > at the same time. They all shut down at the same time after the
> new
> > >     > time-out setting.
> > >     >
> > >     > Thanks !
> > >     > Stéphane
> > >     >
> > >     > On 28 Oct. 2017 9:42 am, "Jun Rao" <[email protected]> wrote:
> > >     >
> > >     > > Hi, Everyone,
> > >     > >
> > >     > > We created "KIP-217: Expose a timeout to allow an expired ZK
> > > session to
> > >     > be
> > >     > > re-created".
> > >     > >
> > >     > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > >     > > 217%3A+Expose+a+timeout+to+allow+an+expired+ZK+session+
> > > to+be+re-created
> > >     > >
> > >     > > Please take a look and provide your feedback.
> > >     > >
> > >     > > Thanks,
> > >     > >
> > >     > > Jun
> > >     > >
> > >     >
> > >
> > >
> > >
> > >
> >
>

Re: [DISCUSS] KIP-217: Expose a timeout to allow an expired ZK session to be re-created

Reply via email to