Re: Consumer rebalance per topic

2013-01-08 Thread Joel Koshy
(From http://kafka.apache.org/design.html) one potential benefit of the existing rebalancing logic is to reduce the number of connections to brokers per consumer instance. However, if you have a large number of partitions and few brokers and/or consumer instances then it wouldn't really help; so I

Re: Consumer rebalance per topic

2013-01-08 Thread Pablo Barrera González
Jira ticket https://issues.apache.org/jira/browse/KAFKA-687 2013/1/7 Pablo Barrera González > Thank you Jun and Neha > > I was trying to avoid adding more partitions. I have enough partitions if > you count all partitions in all topics. I understand the problem with > different data load per t

Re: Consumer rebalance per topic

2013-01-07 Thread Pablo Barrera González
Thank you Jun and Neha I was trying to avoid adding more partitions. I have enough partitions if you count all partitions in all topics. I understand the problem with different data load per topic but the current schema does not solve this problem either so we shouldn't be worse is we consider all

Re: Consumer rebalance per topic

2013-01-07 Thread Neha Narkhede
Pablo, That is a good suggestion. Ideally, the partitions across all topics should be distributed evenly across consumer streams instead of a per-topic based decision. There is no particular advantage to the current scheme of per-topic rebalancing that I can think of. Would you mind filing a JIRA

Re: Consumer rebalance per topic

2013-01-07 Thread Jun Rao
Pablo, Currently, partition is the smallest unit that we distribute data among consumers (in the same consumer group). So, if the # of consumers is larger than the total number of partitions in a Kafka cluster (across all brokers), some consumers will never get any data. Such a decision is done on

Consumer rebalance per topic

2013-01-07 Thread Pablo Barrera González
Hello We are starting to use Kafka in production but we found an unexpected (at least for me) behavior with the use of partitions. We have a bunch of topics with a few partitions each. We try to consume all data from several consumers (just one consumer group). The problem is in the rebalance ste