Re: [DISCUSS] KIP-217: Expose a timeout to allow an expired ZK session to be re-created

Jun Rao Mon, 06 Nov 2017 13:12:22 -0800

Ok. Based on the discussion, it seems that doing infinite re-creation is
better. I will cancel the KIP.


Thanks,

Jun

On Thu, Nov 2, 2017 at 6:14 PM, Jeff Widman <j...@jeffwidman.com> wrote:

> +1 for permanent retry under the covers (without an exposed/later
> deprecated config).
>
> That said, I understand the reality that sometimes we have to workaround an
> unfixed issue in another project, so if you think best to expose a config,
> then I have no objections. Mainly I wanted to make sure you'd tried to get
> upstream to fix as that is almost always a cleaner solution.
>
> > The above fact implies some reluctance from the zookeeper community to
> fully
> solve the issue (maybe due to technical issues).
>
> @Ted - I spent some time a few months ago poking through issues on the ZK
> issue tracker, and it looked like there wasn't much activity on the project
> lately. So my guess is that it's less about problems with this particular
> solution, and more that the solution has just enough moving parts that no
> one with commit rights has had the time to review it. As a volunteer
> maintainer on a number of projects, I certainly empathize with them,
> although it would be nice to get some more committers onto the Zookeeper
> project who have the time to review some of these semi-abandoned PRs and
> either accept or reject them.
>
>
>
> On Thu, Nov 2, 2017 at 3:00 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
> > Stephane:
> > bq. hasn't acted in over a year
> >
> > The above fact implies some reluctance from the zookeeper community to
> > fully solve the issue (maybe due to technical issues).
> > Anyway, we should plan on not relying on the fix to go through in the
> near
> > future.
> >
> > As for Jun's latest suggestion, I think we should add periodic logging
> > indicating the retry.
> >
> > A KIP is not needed if we go that route.
> >
> > Cheers
> >
> > On Thu, Nov 2, 2017 at 2:54 PM, Stephane Maarek <
> > steph...@simplemachines.com.au> wrote:
> >
> > > Hi Jun
> > >
> > > I think this is a better option. Would that change require a kip then
> as
> > > it's not a change in public API ?
> > >
> > > @ted it was marked as a blocked for 3.4.11 but they pushed it. It seems
> > > that the owner of the pr hasn't acted in over a year and I think
> someone
> > > needs to take ownership of that. Additionally, this would be a change
> in
> > > Kafka zookeeper client dependency, so no need to update your zookeeper
> > > quorum to benefit from the change
> > >
> > > Thanks
> > > Stéphane
> > >
> > >
> > > On 3 Nov. 2017 8:45 am, "Jun Rao" <j...@confluent.io> wrote:
> > >
> > > Stephane, Jeff,
> > >
> > > Another option is to not expose the reconnect timeout config and just
> > retry
> > > the creation of Zookeeper forever. This is an improvement from the
> > current
> > > situation and if zookeeper-2184 is fixed in the future, we don't need
> to
> > > deprecate the config.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Thu, Nov 2, 2017 at 9:02 AM, Ted Yu <yuzhih...@gmail.com> wrote:
> > >
> > > > ZOOKEEPER-2184 is scheduled for 3.4.12 whose release is unknown.
> > > >
> > > > I think adding the session recreation on Kafka side should benefit
> > Kafka
> > > > users, especially those who don't plan to move to 3.4.12+ in the near
> > > > future.
> > > >
> > > > On Wed, Nov 1, 2017 at 6:34 PM, Jun Rao <j...@confluent.io> wrote:
> > > >
> > > > > Hi, Stephane,
> > > > >
> > > > > 3) The difference is that currently, there is no retry when
> > re-creating
> > > > the
> > > > > Zookeeper object when a ZK session expires. So, if the re-creation
> of
> > > > > Zookeeper fails, the broker just logs the error and the Zookeeper
> > > object
> > > > > will never be created again. With this KIP, we will keep retrying
> the
> > > > > creation of Zookeeper until success.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > > On Tue, Oct 31, 2017 at 3:28 PM, Stephane Maarek <
> > > > > steph...@simplemachines.com.au> wrote:
> > > > >
> > > > > > Hi Jun,
> > > > > >
> > > > > > Thanks for the reply.
> > > > > >
> > > > > > 1) The reason I'm asking about it is I wonder if it's not worth
> > > > focusing
> > > > > > the development efforts on taking ownership of the existing PR (
> > > > > > https://github.com/apache/zookeeper/pull/150)  to fix
> > > ZOOKEEPER-2184,
> > > > > > rebase it and have it merged into the ZK codebase shortly.  I
> feel
> > > this
> > > > > KIP
> > > > > > might introduce a setting that could be deprecated shortly and
> > > confuse
> > > > > the
> > > > > > end user a bit further with one more knob to turn.
> > > > > >
> > > > > > 3) I'm not sure if I fully understand, sorry for the beginner's
> > > > question:
> > > > > > if the default timeout is infinite, then it won't change anything
> > to
> > > > how
> > > > > > Kafka works from today, does it? (unless I'm missing something
> > > sorry).
> > > > If
> > > > > > not set to infinite, then we introduce the risk of a whole
> cluster
> > > > > shutting
> > > > > > down at once?
> > > > > >
> > > > > > Thanks,
> > > > > > Stephane
> > > > > >
> > > > > > On 31/10/17, 1:00 pm, "Jun Rao" <j...@confluent.io> wrote:
> > > > > >
> > > > > >     Hi, Stephane,
> > > > > >
> > > > > >     Thanks for the reply.
> > > > > >
> > > > > >     1) Fixing the issue in ZK will be ideal. Not sure when it
> will
> > > > happen
> > > > > >     though. Once it's fixed, we can probably deprecate this
> config.
> > > > > >
> > > > > >     2) That could be useful. Is there a java api to do that at
> > > runtime?
> > > > > > Also,
> > > > > >     invalidating DNS cache doesn't always fix the issue of
> > unresolved
> > > > > > host. In
> > > > > >     some of the cases, human intervention is needed.
> > > > > >
> > > > > >     3) The default timeout is infinite though.
> > > > > >
> > > > > >     Jun
> > > > > >
> > > > > >
> > > > > >     On Sat, Oct 28, 2017 at 11:48 PM, Stephane Maarek <
> > > > > >     steph...@simplemachines.com.au> wrote:
> > > > > >
> > > > > >     > Hi Jun,
> > > > > >     >
> > > > > >     > I think this is very helpful. Restarting Kafka brokers in
> > case
> > > of
> > > > > > zookeeper
> > > > > >     > host change is not a well known operation.
> > > > > >     >
> > > > > >     > Few questions:
> > > > > >     > 1) would it not be worth fixing the problem at the source ?
> > > This
> > > > > has
> > > > > > been
> > > > > >     > stuck for a while though, maybe a little push would help :
> > > > > >     > https://issues.apache.org/jira/plugins/servlet/mobile#
> > > > > > issue/ZOOKEEPER-2184
> > > > > >     >
> > > > > >     > 2) upon recreating the zookeeper object , is it not
> possible
> > to
> > > > > > invalidate
> > > > > >     > the DNS cache so that it resolves the new hostname?
> > > > > >     >
> > > > > >     > 3) could the cluster be down in this situation: one
> migrates
> > an
> > > > > > entire
> > > > > >     > zookeeper cluster to new machines (one by one). The quorum
> is
> > > > still
> > > > > > alive
> > > > > >     > without downtime, but now every broker in a cluster can't
> > > resolve
> > > > > > zookeeper
> > > > > >     > at the same time. They all shut down at the same time after
> > the
> > > > new
> > > > > >     > time-out setting.
> > > > > >     >
> > > > > >     > Thanks !
> > > > > >     > Stéphane
> > > > > >     >
> > > > > >     > On 28 Oct. 2017 9:42 am, "Jun Rao" <j...@confluent.io>
> wrote:
> > > > > >     >
> > > > > >     > > Hi, Everyone,
> > > > > >     > >
> > > > > >     > > We created "KIP-217: Expose a timeout to allow an expired
> > ZK
> > > > > > session to
> > > > > >     > be
> > > > > >     > > re-created".
> > > > > >     > >
> > > > > >     > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > >     > > 217%3A+Expose+a+timeout+to+allow+an+expired+ZK+session+
> > > > > > to+be+re-created
> > > > > >     > >
> > > > > >     > > Please take a look and provide your feedback.
> > > > > >     > >
> > > > > >     > > Thanks,
> > > > > >     > >
> > > > > >     > > Jun
> > > > > >     > >
> > > > > >     >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>
> --
>
> *Jeff Widman*
> jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
> <><
>

Re: [DISCUSS] KIP-217: Expose a timeout to allow an expired ZK session to be re-created

Reply via email to