I see your point now. The old consumer does have a hard-coded "round-robin-per-topic" logic which have this issue. In the new consumer, we will make the assignment logic customizable so that people can specify different rebalance algorithms they like.
Also I will soon send out a new consumer design summary email for more comments. Feel free to give us more thoughts you have about the new consumer design. Guozhang On Mon, Jul 7, 2014 at 8:44 AM, Jason Rosenberg <j...@squareup.com> wrote: > Guozhang, > > I'm not suggesting we parallelize within a partition.... > > The problem with the current high-level consumer is, if you use a regex to > select multiple topics, and then have multiple consumers in the same group, > usually the first consumer will 'own' all the topics, and no amount of > sub-sequent rebalancing will allow other consumers in the group to own some > of the topics. Re-balancing does allow other consumers to own multiple > partitions, but if a topic has only 1 partition, only the first consumer to > initialize will get all the work. > > So, I'm wondering if the new api will be better about re-balancing the work > at the partition level, and not the topic level, as such. > > Jason > > > On Mon, Jul 7, 2014 at 11:17 AM, Guozhang Wang <wangg...@gmail.com> wrote: > > > Hi Jason, > > > > In the new design the consumption is still at the per-partition > > granularity. The main rationale of doing this is ordering: Within a > > partition we want to preserve the ordering such that message B produced > > after message A will also be consumed and processed after message A. And > > producers can use keys to make sure messages with the same ordering group > > will be in the same partition. To do this we have to make one partition > > only being consumed by a single client at a time. On the other hand, when > > one wants to add the number of consumers beyond the number of partitions, > > he can always use the topic tool to dynamically add more partitions to > the > > topic. > > > > Do you have a specific scenario in mind that would require > single-partition > > topics? > > > > Guozhang > > > > > > > > On Mon, Jul 7, 2014 at 7:43 AM, Jason Rosenberg <j...@squareup.com> > wrote: > > > > > I've been looking at the new consumer api outlined here: > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+Design > > > > > > One issue in the current high-level consumer, is that it does not do a > > good > > > job of distributing a set of topics between multiple consumers, unless > > each > > > topic has multiple partitions. This has always seemed strange to me, > > since > > > at the end of the day, even for single partition topics, the basic unit > > of > > > consumption is still at the partition level (so you'd expect > rebalancing > > to > > > try to evenly distribute partitions (regardless of the topic)). > > > > > > It's not clearly spelled out in the new consumer api wiki, so I'll just > > > ask, will this issue be addressed in the new api? I think I've asked > > this > > > before, but I wanted to go check again, and am not seeing this > explicitly > > > addressed in the design. > > > > > > Thanks > > > > > > Jason > > > > > > > > > > > -- > > -- Guozhang > > > -- -- Guozhang