Thanks for the answers. Have some follow up questions. Let me get a bit more specific.
In a scenario of 1 topic with 400 - 500 partitions 1. Is it ok to have short lived consumer? Or it is recommended to have only long running consumers? 2. You mentioned that rebalance latency depends on # of consumers and # number of topics. In the case of 1 topic and hundred of consumers can say the latency is in the tens of seconds as you mentioned before? 3. You mentioned On Wed, Nov 5, 2014 at 10:03 PM, Guozhang Wang <wangg...@gmail.com> wrote: > Hello Dinesh, > > 1. A rebalance is triggered when the consumers is notified or the group > member change / topic-partition change through ZK. > > 2. The cost of a rebalance is positively related to the #. consumers in the > group and the #. of topics this group is consuming. The latency of the > rebalance can be as high as tens of seconds when you have large number of > consumers fetching from a large number of topics. > > 3. Rebalance algorithm is deterministic (range-based), and before it kicks > in consumers will first commit their current offset and stop fetchers, > hence when M1 is already fetched by some consumer C1 before rebalance it > will not be re-send to another C2 after the rebalance. > > You can also read some faqs here: > > > https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-CanIpredicttheresultsoftheconsumerrebalance > ? > > And in 0.9, we will release our new consumer client, which will reduce > rebalance latency compared to the current consumer. > > https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design > > > Guozhang > > > > > > > On Wed, Nov 5, 2014 at 4:50 AM, dinesh kumar <dinesh...@gmail.com> wrote: > > > Hello, > > > > I am trying to come up with a design for consuming from Kafka. *I am > using > > 0.8.1.1 version of Kafka. *I am thinking of designing a system where the > > consumer will be created every few seconds, consume the data from Kafka, > > process it and then quits after committing the offsets to Kafka. At any > > point of time expect 250 - 300 consumers to be active (running as > > ThreadPools in different machines). > > > > 1. How and When a rebalance of partition happens? > > > > 2. How costly is the rebalancing of partitions among the consumers. I am > > expecting a new consumer finishing up or joining every few seconds to the > > same consumer group. So I just want to know the overhead and latency of a > > rebalancing operation. > > > > 3. Say Consumer C1 has Partitions P1, P2, P3 assigned to it and it is > > processing a message M1 from Partition P1. Now Consumer C2 joins the > > group. How is the partitions divided between C1 and C2. Is there a > > possibility where C1's (which might take some time to commit its message > to > > Kafka) commit for M1 will be rejected and M1 will be treated as a fresh > > message and will be delivered to someone else (I know Kafka is at least > > once delivery model but wanted to confirm if the re partition by any > chance > > cause a re delivery of the same message)? > > > > > > Thanks, > > Dinesh > > > > > > -- > -- Guozhang >