What is the full stack trace? if you see "can't rebalance after 4 retries" then likely the problem is the broker is down or not available
/******************************************* Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop> ********************************************/ On Fri, Nov 29, 2013 at 11:31 AM, Yu, Libo <libo...@citi.com> wrote: > We found our consumer stopped working after this exception occurred. > Can the consumer recover from such an exception? > > Regards, > > Libo > > > -----Original Message----- > From: Florin Trofin [mailto:ftro...@adobe.com] > Sent: Tuesday, July 16, 2013 4:20 PM > To: users@kafka.apache.org > Subject: Re: ConsumerRebalanceFailedException > > Yes, I think these are two separate issues. > > F. > > On 7/16/13 11:32 AM, "Joel Koshy" <jjkosh...@gmail.com> wrote: > > >From a user's perspective, ConsumerRebalanceException is a bit cryptic > >-I think the other thread was to provide a more informative message and > >also be able to recover when a broker does come up (fixed in > >KAFKA-969). > > > >Thanks, > > > >Joel > > > >On Tue, Jul 16, 2013 at 11:04 AM, Vaibhav Puranik <vpura...@gmail.com> > >wrote: > >> Thank you Joel. > >> > >> In a different but related thread, somebody is asking to rename the > >> exception as NoBrokerAvailableExcption. But given the description > >> above, the exception seems to be named appropriately. > >> > >> Regards, > >> Vaibhav > >> > >> > >> On Tue, Jul 16, 2013 at 12:05 AM, Joel Koshy <jjkosh...@gmail.com> > >>wrote: > >> > >>> Yes - rebalance => consumers trying to coordinate through ZK. > >>> Rebalances can happen when one or more of the following happen: > >>> - a consumed topic partition appears or disappears - i.e., if a > >>> broker comes or goes. > >>> - a consumer instance in the group comes or goes "goes" could also > >>> be triggered by session expirations in zookeeper - typically caused > >>> by client-side GC or flaky connections to zookeeper. > >>> > >>> On Mon, Jul 15, 2013 at 10:15 AM, Vaibhav Puranik > >>> <vpura...@gmail.com> > >>> wrote: > >>> > Hi all, > >>> > > >>> > We have a small Kafka cluster (0.7.1 - 3 nodes) in EC2. The load > >>> > is > >>>about > >>> > 200 million events per day, each being few kilobytes. We have a > >>>single > >>> node > >>> > zookeeper. > >>> > > >>> > Yesterday suddenly our Kafka clients started throwing the > >>> > following > >>> > exception: > >>> > java.lang.RuntimeException: > >>> kafka.common.ConsumerRebalanceFailedException: > >>> > > >>>CONSUMER_GROUP_NAME_ip-00-00-00-00.ec2.internal-1373821190828-5f78e9a > >>>f > >>> > can't rebalance after 4 retries > >>> > at > >>> > > >>> > >>>com.gumgum.kafka.consumer.KafkaTemplate.executeWithBatch(KafkaTemplat > >>>e.j > >>>ava:59) > >>> > at > >>> > > >>> > >>>com.gumgum.storm.fileupload.GenericKafkaSpout.nextTuple(GenericKafkaS > >>>pou > >>>t.java:73) > >>> > at > >>> > > >>> > >>>backtype.storm.daemon.executor$fn__3968$fn__4009$fn__4010.invoke(exec > >>>uto > >>>r.clj:433) > >>> > at backtype.storm.util$async_loop$fn__465.invoke(util.clj:377) > >>> > > >>> > None of the Kafka clients (ConsumerConenctor class) would start. > >>> > They > >>> would > >>> > fail with the exception. > >>> > > >>> > We tried restarting the clilents, restarting the zookeeper as well. > >>>But > >>> > finally it all started working when we restarted all of our kafka > >>> brokers. > >>> > We didn't lose any data because producers (going directly to the > >>>brokers > >>> > through a load balancer) were working fine. > >>> > > >>> > I tried googling this issue and looks like lot of people have > >>> > faced > >>>it, > >>> but > >>> > couldn't get anything concrete. > >>> > > >>> > Given this, I have two questions: > >>> > > >>> > It will be nice if you can tell me why this can happen or point me > >>>to a > >>> > link where I can understand it better. What does Consumer > >>> > Rebalancing > >>> mean? > >>> > Does that mean consumers are trying to coordinate amongst > >>> > themselves > >>> using > >>> > Zookeeper? > >>> > > >>> > On a separate note, are there any JMX parameters I need to be > >>>monitoring > >>> to > >>> > make sure that my kafka cluster is healthy? How can I keep watch > >>> > on > >>>my > >>> > kafka cluster? > >>> > > >>> > Regards, > >>> > Vaibhav Puranik > >>> > GumGum > >>> > >