[ https://issues.apache.org/jira/browse/KAFKA-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Guozhang Wang updated KAFKA-2329: --------------------------------- Status: In Progress (was: Patch Available) > Consumers balance fails when multiple consumers are started simultaneously. > --------------------------------------------------------------------------- > > Key: KAFKA-2329 > URL: https://issues.apache.org/jira/browse/KAFKA-2329 > Project: Kafka > Issue Type: Bug > Components: consumer > Affects Versions: 0.8.2.1, 0.8.1.1 > Reporter: Ze'ev Eli Klapow > Assignee: Ze'ev Eli Klapow > Labels: consumer, patch > Fix For: 0.8.1.2 > > Attachments: zookeeper-consumer-connector-epoch-node.patch > > > During consumer startup a race condition can occur if multiple consumers are > started (nearly) simultaneously. > If a second consumer is started while the first consumer is in the middle of > {{zkClient.subscribeChildChanges}} the first consumer will never see the > registration of the second consumer, because the consumer registration node > for the second consumer will be unwatched, and no new child will be > registered later. This causes the first consumer to own all partitions, and > then never release ownership causing the second consumer to fail rebalancing. > The attached patch solves this by using an "epoch" node which all consumers > watch and update to trigger a rebalance. When a rebalance is triggered we > check the consumer registrations against a cached state, to avoid unnecessary > rebalances. For safety, we also periodically check the consumer registrations > and rebalance. We have been using this patch in production at HubSpot for a > while and it has eliminated all rebalance issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)