Thanks for the answers. Have some follow up questions.

Let me get a bit more specific.

In a scenario of 1 topic with 400 - 500 partitions

1. Is it ok to have short lived consumer? Or it is recommended to have only
long running consumers?

2. You mentioned that rebalance latency depends on # of consumers and #
number of topics. In the case of 1 topic and hundred of consumers can say
the latency is in the tens of seconds as you mentioned before?

3. You mentioned

On Wed, Nov 5, 2014 at 10:03 PM, Guozhang Wang <wangg...@gmail.com> wrote:

> Hello Dinesh,
>
> 1. A rebalance is triggered when the consumers is notified or the group
> member change / topic-partition change through ZK.
>
> 2. The cost of a rebalance is positively related to the #. consumers in the
> group and the #. of topics this group is consuming. The latency of the
> rebalance can be as high as tens of seconds when you have large number of
> consumers fetching from a large number of topics.
>
> 3. Rebalance algorithm is deterministic (range-based), and before it kicks
> in consumers will first commit their current offset and stop fetchers,
> hence when M1 is already fetched by some consumer C1 before rebalance it
> will not be re-send to another C2 after the rebalance.
>
> You can also read some faqs here:
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-CanIpredicttheresultsoftheconsumerrebalance
> ?
>
> And in 0.9, we will release our new consumer client, which will reduce
> rebalance latency compared to the current consumer.
>
> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design
>
>
> Guozhang
>
>
>
>
>
>
> On Wed, Nov 5, 2014 at 4:50 AM, dinesh kumar <dinesh...@gmail.com> wrote:
>
> > Hello,
> >
> > I am trying to come up with a design for consuming from Kafka.  *I am
> using
> > 0.8.1.1 version of Kafka. *I am thinking of designing a system where the
> > consumer will be created every few seconds, consume the data from Kafka,
> > process it and then quits after committing the offsets to Kafka. At any
> > point of time expect 250 - 300 consumers to be active (running as
> > ThreadPools in different machines).
> >
> > 1. How and When a rebalance of partition happens?
> >
> > 2. How costly is the rebalancing of partitions among the consumers. I am
> > expecting a new consumer finishing up or joining every few seconds to the
> > same consumer group. So I just want to know the overhead and latency of a
> > rebalancing operation.
> >
> > 3. Say Consumer C1 has Partitions P1, P2, P3 assigned to it and it is
> > processing a message M1 from Partition P1. Now Consumer C2 joins the
> > group.  How is the partitions divided between C1 and C2. Is there a
> > possibility where C1's (which might take some time to commit its message
> to
> > Kafka) commit for M1 will be rejected and M1 will be treated as a fresh
> > message and will be delivered to someone else (I know Kafka is at least
> > once delivery model but wanted to confirm if the re partition by any
> chance
> > cause a re delivery of the same message)?
> >
> >
> > Thanks,
> > Dinesh
> >
>
>
>
> --
> -- Guozhang
>

Reply via email to