Ze'ev Eli Klapow created KAFKA-2329:
---------------------------------------

             Summary: Consumers balance fails when multiple consumers are 
started simultaneously.
                 Key: KAFKA-2329
                 URL: https://issues.apache.org/jira/browse/KAFKA-2329
             Project: Kafka
          Issue Type: Bug
          Components: consumer
    Affects Versions: 0.8.2.1, 0.8.1.1
            Reporter: Ze'ev Eli Klapow
            Assignee: Neha Narkhede
             Fix For: 0.8.1.2


During consumer startup a race condition can occur if multiple consumers are 
started (nearly) simultaneously. 

If a second consumer is started while the first consumer is in the middle of 
{{zkClient.subscribeChildChanges}} the first consumer will never see the 
registration of the second consumer, because the consumer registration node for 
the second consumer will be unwatched, and no new child will be registered 
later. This causes the first consumer to own all partitions, and then never 
release ownership causing the second consumer to fail rebalancing.

The attached patch solves this by using an "epoch" node which all consumers 
watch and update to trigger  a rebalance. When a rebalance is triggered we 
check the consumer registrations against a cached state, to avoid unnecessary 
rebalances. For safety, we also periodically check the consumer registrations 
and rebalance. We have been using this patch in production at HubSpot for a 
while and it has eliminated all rebalance issues.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to