Great, that's reassuring! What's the time frame for having a more or less stable version to try out?
Jason On Mon, Jul 7, 2014 at 12:59 PM, Guozhang Wang <wangg...@gmail.com> wrote: > I see your point now. The old consumer does have a hard-coded > "round-robin-per-topic" logic which have this issue. In the new consumer, > we will make the assignment logic customizable so that people can specify > different rebalance algorithms they like. > > Also I will soon send out a new consumer design summary email for more > comments. Feel free to give us more thoughts you have about the new > consumer design. > > Guozhang > > > On Mon, Jul 7, 2014 at 8:44 AM, Jason Rosenberg <j...@squareup.com> wrote: > > > Guozhang, > > > > I'm not suggesting we parallelize within a partition.... > > > > The problem with the current high-level consumer is, if you use a regex > to > > select multiple topics, and then have multiple consumers in the same > group, > > usually the first consumer will 'own' all the topics, and no amount of > > sub-sequent rebalancing will allow other consumers in the group to own > some > > of the topics. Re-balancing does allow other consumers to own multiple > > partitions, but if a topic has only 1 partition, only the first consumer > to > > initialize will get all the work. > > > > So, I'm wondering if the new api will be better about re-balancing the > work > > at the partition level, and not the topic level, as such. > > > > Jason > > > > > > On Mon, Jul 7, 2014 at 11:17 AM, Guozhang Wang <wangg...@gmail.com> > wrote: > > > > > Hi Jason, > > > > > > In the new design the consumption is still at the per-partition > > > granularity. The main rationale of doing this is ordering: Within a > > > partition we want to preserve the ordering such that message B produced > > > after message A will also be consumed and processed after message A. > And > > > producers can use keys to make sure messages with the same ordering > group > > > will be in the same partition. To do this we have to make one partition > > > only being consumed by a single client at a time. On the other hand, > when > > > one wants to add the number of consumers beyond the number of > partitions, > > > he can always use the topic tool to dynamically add more partitions to > > the > > > topic. > > > > > > Do you have a specific scenario in mind that would require > > single-partition > > > topics? > > > > > > Guozhang > > > > > > > > > > > > On Mon, Jul 7, 2014 at 7:43 AM, Jason Rosenberg <j...@squareup.com> > > wrote: > > > > > > > I've been looking at the new consumer api outlined here: > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+Design > > > > > > > > One issue in the current high-level consumer, is that it does not do > a > > > good > > > > job of distributing a set of topics between multiple consumers, > unless > > > each > > > > topic has multiple partitions. This has always seemed strange to me, > > > since > > > > at the end of the day, even for single partition topics, the basic > unit > > > of > > > > consumption is still at the partition level (so you'd expect > > rebalancing > > > to > > > > try to evenly distribute partitions (regardless of the topic)). > > > > > > > > It's not clearly spelled out in the new consumer api wiki, so I'll > just > > > > ask, will this issue be addressed in the new api? I think I've asked > > > this > > > > before, but I wanted to go check again, and am not seeing this > > explicitly > > > > addressed in the design. > > > > > > > > Thanks > > > > > > > > Jason > > > > > > > > > > > > > > > > -- > > > -- Guozhang > > > > > > > > > -- > -- Guozhang >