Jira ticket https://issues.apache.org/jira/browse/KAFKA-687



2013/1/7 Pablo Barrera González <pablo.barr...@gmail.com>

> Thank you Jun and Neha
>
> I was trying to avoid adding more partitions. I have enough partitions if
> you count all partitions in all topics. I understand the problem with
> different data load per topic but the current schema does not solve this
> problem either so we shouldn't be worse is we consider all partitions from
> all topics at the same time.
>
> I will open the JIRA ticket to track this.
>
> Thanks again for the clarification.
>
> Cheers
>
> Pablo
>
>
>
> 2013/1/7 Neha Narkhede <neha.narkh...@gmail.com>
>
>> Pablo,
>>
>> That is a good suggestion. Ideally, the partitions across all topics
>> should
>> be distributed evenly across consumer streams instead of a per-topic based
>> decision. There is no particular advantage to the current scheme of
>> per-topic rebalancing that I can think of. Would you mind filing a JIRA to
>> track this improvement ?
>>
>> Thanks,
>> Neha
>>
>>
>> On Mon, Jan 7, 2013 at 9:10 AM, Jun Rao <jun...@gmail.com> wrote:
>>
>> > Pablo,
>> >
>> > Currently, partition is the smallest unit that we distribute data among
>> > consumers (in the same consumer group). So, if the # of consumers is
>> larger
>> > than the total number of partitions in a Kafka cluster (across all
>> > brokers), some consumers will never get any data. Such a decision is
>> done
>> > on a per topic basis. If a consumer consumes multiple topics, it would
>> make
>> > sense to divide partitions across all topics to consumers. We haven't
>> done
>> > that yet. Part of the reason is that we need to figure out how to
>> balance
>> > the data across topics since they can be of different sizes. We can look
>> > into that post 0.8.
>> >
>> > For now, the solution is to increase the number of partitions on the
>> > broker.
>> >
>> > Thanks,
>> >
>> > Jun
>> >
>> > On Mon, Jan 7, 2013 at 9:03 AM, Pablo Barrera González <
>> > pablo.barr...@gmail.com> wrote:
>> >
>> > > Hello
>> > >
>> > > We are starting to use Kafka in production but we found an unexpected
>> (at
>> > > least for me) behavior with the use of partitions. We have a bunch of
>> > > topics with a few partitions each. We try to consume all data from
>> > several
>> > > consumers (just one consumer group).
>> > >
>> > > The problem is in the rebalance step. The rebalance splits the
>> partitions
>> > > per topic between all consumers. So if you have 100 topics but only 2
>> > > partitions each and 10 consumers only two consumers will be used. That
>> > is,
>> > > for each topic all partitions will be listed and shared between the
>> > > consumers in the consumer group in order (not randomly).
>> > >
>> > > This behavior is also described in algorithm 1 of the original kafka
>> > paper
>> > > [1].
>> > >
>> > > I don't understand this decision. Why is split by topic? Does it make
>> > sense
>> > > to divide all partitions from all topics between all the consumers in
>> the
>> > > consumer group? I don't see the reason of this so I would like to hear
>> > your
>> > > opinion before changing the code.
>> > >
>> > > We are using kafka 0.7.1.
>> > >
>> > > Thank you in advance
>> > >
>> > > Pablo
>> > >
>> > > [1] "Kafka: a Distributed Messaging System for Log Processing", Jay
>> > Kreps,
>> > > Neha Narkhede and Jun Rao.
>> > >
>> > >
>> >
>> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
>> > >
>> >
>>
>
>

Reply via email to