Re: Question on hello-samza (Kafka startup and shutdown)

Chinmay Soman Fri, 20 Feb 2015 10:58:17 -0800

That might be it - yes ! I do see infinite retires on the Zookeeper
connections. Thanks for pointing it out


On Fri, Feb 20, 2015 at 10:31 AM, Aditya Auradkar <
[email protected]> wrote:

> Hey Chinmay,
>
> I remember someone else having this issue with Kafka + Zookeeper. IIRC,
> the cause was ZkClient blocking indefinitely.
>
> You may find this useful.
> https://issues.apache.org/jira/browse/KAFKA-1907
> http://mail-archives.apache.org/mod_mbox/kafka-dev/201501.mbox/browser
>
> Aditya
>
> ________________________________________
> From: Chinmay Soman [[email protected]]
> Sent: Friday, February 20, 2015 10:15 AM
> To: [email protected]
> Subject: Re: Question on hello-samza (Kafka startup and shutdown)
>
> I haven't really figured it out. But just to clarify - I'm not starting
> stopping within 5 seconds of each other - its more like a couple of hours.
>
> The Kafka process is indeed running even after stop all : It seems to be
> waiting on Zookeeper (doing a lot of retries). If I bring up Zookeeper
> again - then the Kafka process shuts down cleanly :)   But yes - in most
> cases I'm using SIGKILL and not SIGTERM to resolve this.
>
> This is not really an urgent issue - but was just curious - what's really
> happening ?
>
> On Fri, Feb 20, 2015 at 8:47 AM, Chris Riccomini <[email protected]>
> wrote:
>
> > Hey Chinmay,
> >
> > It seems controlled.shutdown.enable=true is the default. Chinmay, did you
> > figure this out? I haven't seen this before, but I don't usually
> stop/start
> > within 5s of eachother.
> >
> > One thing that you might have a look at is whether the Kafka or ZK
> > processes are living past bin/grid stop all. I have seen procs (NM and
> > Kafka usually) continue to be alive after `stop all` is executed. I think
> > this is because the stop scripts SIGTERM and return immediately. This
> > allows procs to do a cleaner shutdown. But if you stop/start quickly, you
> > might get some weirdness there. Try jps'ing in between the stop/start,
> and
> > check to make sure there's nothing still alive (wait in a loop until
> > everything shuts down cleanly, and kill -9 if it takes more than 60s, or
> > something).
> >
> > Cheers,
> > Chris
> >
> > On Thu, Feb 19, 2015 at 2:01 PM, Neha Narkhede <[email protected]>
> > wrote:
> >
> > > Depending on the version of Kafka you're at,
> "controlled.shutdown.enable"
> > > should be set to true. If that's true and you always shutdown the
> broker
> > > cleanly (kill -15, not kill -9) and there are more than 1 replicas
> > > available, you should not see LeaderNotAvailable exceptions. If you
> kill
> > > the broker (kill -9) then Kafka does not get a chance to move the
> leaders
> > > away from the broken being shut down and the leader re-election can
> take
> > > some time leading to many LeaderNotAvailable exceptions.
> > >
> > > You can verify the replica availability as well as leader movement
> > through
> > > the kafka-topics command before shutting down zookeeper.
> > >
> > > Thanks
> > > Neha
> > >
> > > On Thu, Feb 19, 2015 at 10:51 AM, Felix GV
> > <[email protected]
> > > >
> > > wrote:
> > >
> > > > I'm not 100% sure, but I think this happens when ZK ephemeral znodes
> > have
> > > > not had time to expire properly. When Kafka shuts down gracefully, it
> > > > should clean up its ephemeral nodes immediately (presumably, but that
> > is
> > > > also an assumption... maybe it does have a short-coming in its
> graceful
> > > > shutdown logic). If Kafka gets killed improperly and bounced back up
> > > right
> > > > away, it cannot assume leadership properly because the ephemeral
> znodes
> > > of
> > > > the previous run are still there in ZK.
> > > >
> > > > I imagine Kafka could have some logic to deal with that better when
> it
> > > > gets fast-bounced... Alternatively, you may just have to wait a bit
> > > before
> > > > restarting Kafka after killing it.
> > > >
> > > > If anyone knows better, please correct me if I'm wrong.
> > > >
> > > > --
> > > >
> > > > Felix GV
> > > > Data Infrastructure Engineer
> > > > Distributed Data Systems
> > > > LinkedIn
> > > >
> > > > [email protected]
> > > > linkedin.com/in/felixgv
> > > >
> > > > ________________________________________
> > > > From: Chinmay Soman [[email protected]]
> > > > Sent: Thursday, February 19, 2015 10:44 AM
> > > > To: [email protected]
> > > > Subject: Question on hello-samza (Kafka startup and shutdown)
> > > >
> > > > Sending to a wider audience to know if anyone is also seeing this
> > issue.
> > > >
> > > > It seems Kafka gets in a weird state everytime I do bin/grid stop all
> > > (and
> > > > then start all).
> > > >
> > > > I keep getting a LeaderNotAvailable exception on the producer side.
> It
> > > > seems this happens everytime Kafka hasn't been shut down properly.
> This
> > > > issue goes away if I use the following sequence:
> > > >
> > > > * bin/grid stop kafka
> > > > * bin/grid stop zookeeper (after like 5 seconds).
> > > >
> > > > (and then start everything).
> > > >
> > > > Has anyone else seen this ?
> > > >
> > > > --
> > > > Thanks and regards
> > > >
> > > > Chinmay Soman
> > > >
> > >
> >
>
>
>
> --
> Thanks and regards
>
> Chinmay Soman
>



-- 
Thanks and regards

Chinmay Soman

Re: Question on hello-samza (Kafka startup and shutdown)

Reply via email to