Hello Dinesh, 1. A rebalance is triggered when the consumers is notified or the group member change / topic-partition change through ZK.
2. The cost of a rebalance is positively related to the #. consumers in the group and the #. of topics this group is consuming. The latency of the rebalance can be as high as tens of seconds when you have large number of consumers fetching from a large number of topics. 3. Rebalance algorithm is deterministic (range-based), and before it kicks in consumers will first commit their current offset and stop fetchers, hence when M1 is already fetched by some consumer C1 before rebalance it will not be re-send to another C2 after the rebalance. You can also read some faqs here: https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-CanIpredicttheresultsoftheconsumerrebalance ? And in 0.9, we will release our new consumer client, which will reduce rebalance latency compared to the current consumer. https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design Guozhang On Wed, Nov 5, 2014 at 4:50 AM, dinesh kumar <dinesh...@gmail.com> wrote: > Hello, > > I am trying to come up with a design for consuming from Kafka. *I am using > 0.8.1.1 version of Kafka. *I am thinking of designing a system where the > consumer will be created every few seconds, consume the data from Kafka, > process it and then quits after committing the offsets to Kafka. At any > point of time expect 250 - 300 consumers to be active (running as > ThreadPools in different machines). > > 1. How and When a rebalance of partition happens? > > 2. How costly is the rebalancing of partitions among the consumers. I am > expecting a new consumer finishing up or joining every few seconds to the > same consumer group. So I just want to know the overhead and latency of a > rebalancing operation. > > 3. Say Consumer C1 has Partitions P1, P2, P3 assigned to it and it is > processing a message M1 from Partition P1. Now Consumer C2 joins the > group. How is the partitions divided between C1 and C2. Is there a > possibility where C1's (which might take some time to commit its message to > Kafka) commit for M1 will be rejected and M1 will be treated as a fresh > message and will be delivered to someone else (I know Kafka is at least > once delivery model but wanted to confirm if the re partition by any chance > cause a re delivery of the same message)? > > > Thanks, > Dinesh > -- -- Guozhang