That might be it - yes ! I do see infinite retires on the Zookeeper connections. Thanks for pointing it out
On Fri, Feb 20, 2015 at 10:31 AM, Aditya Auradkar < aaurad...@linkedin.com.invalid> wrote: > Hey Chinmay, > > I remember someone else having this issue with Kafka + Zookeeper. IIRC, > the cause was ZkClient blocking indefinitely. > > You may find this useful. > https://issues.apache.org/jira/browse/KAFKA-1907 > http://mail-archives.apache.org/mod_mbox/kafka-dev/201501.mbox/browser > > Aditya > > ________________________________________ > From: Chinmay Soman [chinmay.cere...@gmail.com] > Sent: Friday, February 20, 2015 10:15 AM > To: dev@samza.apache.org > Subject: Re: Question on hello-samza (Kafka startup and shutdown) > > I haven't really figured it out. But just to clarify - I'm not starting > stopping within 5 seconds of each other - its more like a couple of hours. > > The Kafka process is indeed running even after stop all : It seems to be > waiting on Zookeeper (doing a lot of retries). If I bring up Zookeeper > again - then the Kafka process shuts down cleanly :) But yes - in most > cases I'm using SIGKILL and not SIGTERM to resolve this. > > This is not really an urgent issue - but was just curious - what's really > happening ? > > On Fri, Feb 20, 2015 at 8:47 AM, Chris Riccomini <criccom...@apache.org> > wrote: > > > Hey Chinmay, > > > > It seems controlled.shutdown.enable=true is the default. Chinmay, did you > > figure this out? I haven't seen this before, but I don't usually > stop/start > > within 5s of eachother. > > > > One thing that you might have a look at is whether the Kafka or ZK > > processes are living past bin/grid stop all. I have seen procs (NM and > > Kafka usually) continue to be alive after `stop all` is executed. I think > > this is because the stop scripts SIGTERM and return immediately. This > > allows procs to do a cleaner shutdown. But if you stop/start quickly, you > > might get some weirdness there. Try jps'ing in between the stop/start, > and > > check to make sure there's nothing still alive (wait in a loop until > > everything shuts down cleanly, and kill -9 if it takes more than 60s, or > > something). > > > > Cheers, > > Chris > > > > On Thu, Feb 19, 2015 at 2:01 PM, Neha Narkhede <neha.narkh...@gmail.com> > > wrote: > > > > > Depending on the version of Kafka you're at, > "controlled.shutdown.enable" > > > should be set to true. If that's true and you always shutdown the > broker > > > cleanly (kill -15, not kill -9) and there are more than 1 replicas > > > available, you should not see LeaderNotAvailable exceptions. If you > kill > > > the broker (kill -9) then Kafka does not get a chance to move the > leaders > > > away from the broken being shut down and the leader re-election can > take > > > some time leading to many LeaderNotAvailable exceptions. > > > > > > You can verify the replica availability as well as leader movement > > through > > > the kafka-topics command before shutting down zookeeper. > > > > > > Thanks > > > Neha > > > > > > On Thu, Feb 19, 2015 at 10:51 AM, Felix GV > > <fville...@linkedin.com.invalid > > > > > > > wrote: > > > > > > > I'm not 100% sure, but I think this happens when ZK ephemeral znodes > > have > > > > not had time to expire properly. When Kafka shuts down gracefully, it > > > > should clean up its ephemeral nodes immediately (presumably, but that > > is > > > > also an assumption... maybe it does have a short-coming in its > graceful > > > > shutdown logic). If Kafka gets killed improperly and bounced back up > > > right > > > > away, it cannot assume leadership properly because the ephemeral > znodes > > > of > > > > the previous run are still there in ZK. > > > > > > > > I imagine Kafka could have some logic to deal with that better when > it > > > > gets fast-bounced... Alternatively, you may just have to wait a bit > > > before > > > > restarting Kafka after killing it. > > > > > > > > If anyone knows better, please correct me if I'm wrong. > > > > > > > > -- > > > > > > > > Felix GV > > > > Data Infrastructure Engineer > > > > Distributed Data Systems > > > > LinkedIn > > > > > > > > f...@linkedin.com > > > > linkedin.com/in/felixgv > > > > > > > > ________________________________________ > > > > From: Chinmay Soman [chinmay.cere...@gmail.com] > > > > Sent: Thursday, February 19, 2015 10:44 AM > > > > To: dev@samza.apache.org > > > > Subject: Question on hello-samza (Kafka startup and shutdown) > > > > > > > > Sending to a wider audience to know if anyone is also seeing this > > issue. > > > > > > > > It seems Kafka gets in a weird state everytime I do bin/grid stop all > > > (and > > > > then start all). > > > > > > > > I keep getting a LeaderNotAvailable exception on the producer side. > It > > > > seems this happens everytime Kafka hasn't been shut down properly. > This > > > > issue goes away if I use the following sequence: > > > > > > > > * bin/grid stop kafka > > > > * bin/grid stop zookeeper (after like 5 seconds). > > > > > > > > (and then start everything). > > > > > > > > Has anyone else seen this ? > > > > > > > > -- > > > > Thanks and regards > > > > > > > > Chinmay Soman > > > > > > > > > > > > > -- > Thanks and regards > > Chinmay Soman > -- Thanks and regards Chinmay Soman