Yes, I think these are two separate issues. F.
On 7/16/13 11:32 AM, "Joel Koshy" <jjkosh...@gmail.com> wrote: >From a user's perspective, ConsumerRebalanceException is a bit cryptic >-I think the other thread was to provide a more informative message >and also be able to recover when a broker does come up (fixed in >KAFKA-969). > >Thanks, > >Joel > >On Tue, Jul 16, 2013 at 11:04 AM, Vaibhav Puranik <vpura...@gmail.com> >wrote: >> Thank you Joel. >> >> In a different but related thread, somebody is asking to rename the >> exception as NoBrokerAvailableExcption. But given the description above, >> the exception seems to be named appropriately. >> >> Regards, >> Vaibhav >> >> >> On Tue, Jul 16, 2013 at 12:05 AM, Joel Koshy <jjkosh...@gmail.com> >>wrote: >> >>> Yes - rebalance => consumers trying to coordinate through ZK. >>> Rebalances can happen when one or more of the following happen: >>> - a consumed topic partition appears or disappears - i.e., if a broker >>> comes or goes. >>> - a consumer instance in the group comes or goes >>> "goes" could also be triggered by session expirations in zookeeper - >>> typically caused by client-side GC or flaky connections to zookeeper. >>> >>> On Mon, Jul 15, 2013 at 10:15 AM, Vaibhav Puranik <vpura...@gmail.com> >>> wrote: >>> > Hi all, >>> > >>> > We have a small Kafka cluster (0.7.1 - 3 nodes) in EC2. The load is >>>about >>> > 200 million events per day, each being few kilobytes. We have a >>>single >>> node >>> > zookeeper. >>> > >>> > Yesterday suddenly our Kafka clients started throwing the following >>> > exception: >>> > java.lang.RuntimeException: >>> kafka.common.ConsumerRebalanceFailedException: >>> > >>>CONSUMER_GROUP_NAME_ip-00-00-00-00.ec2.internal-1373821190828-5f78e9af >>> > can't rebalance after 4 retries >>> > at >>> > >>> >>>com.gumgum.kafka.consumer.KafkaTemplate.executeWithBatch(KafkaTemplate.j >>>ava:59) >>> > at >>> > >>> >>>com.gumgum.storm.fileupload.GenericKafkaSpout.nextTuple(GenericKafkaSpou >>>t.java:73) >>> > at >>> > >>> >>>backtype.storm.daemon.executor$fn__3968$fn__4009$fn__4010.invoke(executo >>>r.clj:433) >>> > at backtype.storm.util$async_loop$fn__465.invoke(util.clj:377) >>> > >>> > None of the Kafka clients (ConsumerConenctor class) would start. They >>> would >>> > fail with the exception. >>> > >>> > We tried restarting the clilents, restarting the zookeeper as well. >>>But >>> > finally it all started working when we restarted all of our kafka >>> brokers. >>> > We didn't lose any data because producers (going directly to the >>>brokers >>> > through a load balancer) were working fine. >>> > >>> > I tried googling this issue and looks like lot of people have faced >>>it, >>> but >>> > couldn't get anything concrete. >>> > >>> > Given this, I have two questions: >>> > >>> > It will be nice if you can tell me why this can happen or point me >>>to a >>> > link where I can understand it better. What does Consumer Rebalancing >>> mean? >>> > Does that mean consumers are trying to coordinate amongst themselves >>> using >>> > Zookeeper? >>> > >>> > On a separate note, are there any JMX parameters I need to be >>>monitoring >>> to >>> > make sure that my kafka cluster is healthy? How can I keep watch on >>>my >>> > kafka cluster? >>> > >>> > Regards, >>> > Vaibhav Puranik >>> > GumGum >>>