I'm not 100% sure, but I think this happens when ZK ephemeral znodes have not had time to expire properly. When Kafka shuts down gracefully, it should clean up its ephemeral nodes immediately (presumably, but that is also an assumption... maybe it does have a short-coming in its graceful shutdown logic). If Kafka gets killed improperly and bounced back up right away, it cannot assume leadership properly because the ephemeral znodes of the previous run are still there in ZK.
I imagine Kafka could have some logic to deal with that better when it gets fast-bounced... Alternatively, you may just have to wait a bit before restarting Kafka after killing it. If anyone knows better, please correct me if I'm wrong. -- Felix GV Data Infrastructure Engineer Distributed Data Systems LinkedIn f...@linkedin.com linkedin.com/in/felixgv ________________________________________ From: Chinmay Soman [chinmay.cere...@gmail.com] Sent: Thursday, February 19, 2015 10:44 AM To: dev@samza.apache.org Subject: Question on hello-samza (Kafka startup and shutdown) Sending to a wider audience to know if anyone is also seeing this issue. It seems Kafka gets in a weird state everytime I do bin/grid stop all (and then start all). I keep getting a LeaderNotAvailable exception on the producer side. It seems this happens everytime Kafka hasn't been shut down properly. This issue goes away if I use the following sequence: * bin/grid stop kafka * bin/grid stop zookeeper (after like 5 seconds). (and then start everything). Has anyone else seen this ? -- Thanks and regards Chinmay Soman