>From a user's perspective, ConsumerRebalanceException is a bit cryptic -I think the other thread was to provide a more informative message and also be able to recover when a broker does come up (fixed in KAFKA-969).
Thanks, Joel On Tue, Jul 16, 2013 at 11:04 AM, Vaibhav Puranik <vpura...@gmail.com> wrote: > Thank you Joel. > > In a different but related thread, somebody is asking to rename the > exception as NoBrokerAvailableExcption. But given the description above, > the exception seems to be named appropriately. > > Regards, > Vaibhav > > > On Tue, Jul 16, 2013 at 12:05 AM, Joel Koshy <jjkosh...@gmail.com> wrote: > >> Yes - rebalance => consumers trying to coordinate through ZK. >> Rebalances can happen when one or more of the following happen: >> - a consumed topic partition appears or disappears - i.e., if a broker >> comes or goes. >> - a consumer instance in the group comes or goes >> "goes" could also be triggered by session expirations in zookeeper - >> typically caused by client-side GC or flaky connections to zookeeper. >> >> On Mon, Jul 15, 2013 at 10:15 AM, Vaibhav Puranik <vpura...@gmail.com> >> wrote: >> > Hi all, >> > >> > We have a small Kafka cluster (0.7.1 - 3 nodes) in EC2. The load is about >> > 200 million events per day, each being few kilobytes. We have a single >> node >> > zookeeper. >> > >> > Yesterday suddenly our Kafka clients started throwing the following >> > exception: >> > java.lang.RuntimeException: >> kafka.common.ConsumerRebalanceFailedException: >> > CONSUMER_GROUP_NAME_ip-00-00-00-00.ec2.internal-1373821190828-5f78e9af >> > can't rebalance after 4 retries >> > at >> > >> com.gumgum.kafka.consumer.KafkaTemplate.executeWithBatch(KafkaTemplate.java:59) >> > at >> > >> com.gumgum.storm.fileupload.GenericKafkaSpout.nextTuple(GenericKafkaSpout.java:73) >> > at >> > >> backtype.storm.daemon.executor$fn__3968$fn__4009$fn__4010.invoke(executor.clj:433) >> > at backtype.storm.util$async_loop$fn__465.invoke(util.clj:377) >> > >> > None of the Kafka clients (ConsumerConenctor class) would start. They >> would >> > fail with the exception. >> > >> > We tried restarting the clilents, restarting the zookeeper as well. But >> > finally it all started working when we restarted all of our kafka >> brokers. >> > We didn't lose any data because producers (going directly to the brokers >> > through a load balancer) were working fine. >> > >> > I tried googling this issue and looks like lot of people have faced it, >> but >> > couldn't get anything concrete. >> > >> > Given this, I have two questions: >> > >> > It will be nice if you can tell me why this can happen or point me to a >> > link where I can understand it better. What does Consumer Rebalancing >> mean? >> > Does that mean consumers are trying to coordinate amongst themselves >> using >> > Zookeeper? >> > >> > On a separate note, are there any JMX parameters I need to be monitoring >> to >> > make sure that my kafka cluster is healthy? How can I keep watch on my >> > kafka cluster? >> > >> > Regards, >> > Vaibhav Puranik >> > GumGum >>