>From a user's perspective, ConsumerRebalanceException is a bit cryptic
-I think the other thread was to provide a more informative message
and also be able to recover when a broker does come up (fixed in
KAFKA-969).

Thanks,

Joel

On Tue, Jul 16, 2013 at 11:04 AM, Vaibhav Puranik <vpura...@gmail.com> wrote:
> Thank you Joel.
>
> In a different but related thread, somebody is asking to rename the
> exception as NoBrokerAvailableExcption. But given the description above,
> the exception seems to be named appropriately.
>
> Regards,
> Vaibhav
>
>
> On Tue, Jul 16, 2013 at 12:05 AM, Joel Koshy <jjkosh...@gmail.com> wrote:
>
>> Yes - rebalance => consumers trying to coordinate through ZK.
>> Rebalances can happen when one or more of the following happen:
>> - a consumed topic partition appears or disappears - i.e., if a broker
>> comes or goes.
>> - a consumer instance in the group comes or goes
>> "goes" could also be triggered by session expirations in zookeeper -
>> typically caused by client-side GC or flaky connections to zookeeper.
>>
>> On Mon, Jul 15, 2013 at 10:15 AM, Vaibhav Puranik <vpura...@gmail.com>
>> wrote:
>> > Hi all,
>> >
>> > We have a small Kafka cluster (0.7.1 - 3 nodes) in EC2. The load is about
>> > 200 million events per day, each being few kilobytes. We have a single
>> node
>> > zookeeper.
>> >
>> > Yesterday suddenly our Kafka clients started throwing the following
>> > exception:
>> > java.lang.RuntimeException:
>> kafka.common.ConsumerRebalanceFailedException:
>> > CONSUMER_GROUP_NAME_ip-00-00-00-00.ec2.internal-1373821190828-5f78e9af
>> > can't rebalance after 4 retries
>> >     at
>> >
>> com.gumgum.kafka.consumer.KafkaTemplate.executeWithBatch(KafkaTemplate.java:59)
>> >     at
>> >
>> com.gumgum.storm.fileupload.GenericKafkaSpout.nextTuple(GenericKafkaSpout.java:73)
>> >     at
>> >
>> backtype.storm.daemon.executor$fn__3968$fn__4009$fn__4010.invoke(executor.clj:433)
>> >     at backtype.storm.util$async_loop$fn__465.invoke(util.clj:377)
>> >
>> > None of the Kafka clients (ConsumerConenctor class) would start. They
>> would
>> > fail with the exception.
>> >
>> > We tried restarting the clilents, restarting the zookeeper as well. But
>> > finally it all started working when we restarted all of our kafka
>> brokers.
>> > We didn't lose any data because producers (going directly to the brokers
>> > through a load balancer) were working fine.
>> >
>> > I tried googling this issue and looks like lot of people have faced it,
>> but
>> > couldn't get anything concrete.
>> >
>> > Given this, I have two questions:
>> >
>> > It will be nice if you can tell me why this can happen or point me to a
>> > link where I can understand it better. What does Consumer Rebalancing
>> mean?
>> > Does that mean consumers are trying to coordinate amongst themselves
>> using
>> > Zookeeper?
>> >
>> > On a separate note, are there any JMX parameters I need to be monitoring
>> to
>> > make sure that my kafka cluster is healthy? How can I keep watch on my
>> > kafka cluster?
>> >
>> > Regards,
>> > Vaibhav Puranik
>> > GumGum
>>

Reply via email to