I see your point now. The old consumer does have a hard-coded
"round-robin-per-topic" logic which have this issue. In the new consumer,
we will make the assignment logic customizable so that people can specify
different rebalance algorithms they like.

Also I will soon send out a new consumer design summary email for more
comments. Feel free to give us more thoughts you have about the new
consumer design.

Guozhang


On Mon, Jul 7, 2014 at 8:44 AM, Jason Rosenberg <j...@squareup.com> wrote:

> Guozhang,
>
> I'm not suggesting we parallelize within a partition....
>
> The problem with the current high-level consumer is, if you use a regex to
> select multiple topics, and then have multiple consumers in the same group,
> usually the first consumer will 'own' all the topics, and no amount of
> sub-sequent rebalancing will allow other consumers in the group to own some
> of the topics.  Re-balancing does allow other consumers to own multiple
> partitions, but if a topic has only 1 partition, only the first consumer to
> initialize will get all the work.
>
> So, I'm wondering if the new api will be better about re-balancing the work
> at the partition level, and not the topic level, as such.
>
> Jason
>
>
> On Mon, Jul 7, 2014 at 11:17 AM, Guozhang Wang <wangg...@gmail.com> wrote:
>
> > Hi Jason,
> >
> > In the new design the consumption is still at the per-partition
> > granularity. The main rationale of doing this is ordering: Within a
> > partition we want to preserve the ordering such that message B produced
> > after message A will also be consumed and processed after message A. And
> > producers can use keys to make sure messages with the same ordering group
> > will be in the same partition. To do this we have to make one partition
> > only being consumed by a single client at a time. On the other hand, when
> > one wants to add the number of consumers beyond the number of partitions,
> > he can always use the topic tool to dynamically add more partitions to
> the
> > topic.
> >
> > Do you have a specific scenario in mind that would require
> single-partition
> > topics?
> >
> > Guozhang
> >
> >
> >
> > On Mon, Jul 7, 2014 at 7:43 AM, Jason Rosenberg <j...@squareup.com>
> wrote:
> >
> > > I've been looking at the new consumer api outlined here:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+Design
> > >
> > > One issue in the current high-level consumer, is that it does not do a
> > good
> > > job of distributing a set of topics between multiple consumers, unless
> > each
> > > topic has multiple partitions.  This has always seemed strange to me,
> > since
> > > at the end of the day, even for single partition topics, the basic unit
> > of
> > > consumption is still at the partition level (so you'd expect
> rebalancing
> > to
> > > try to evenly distribute partitions (regardless of the topic)).
> > >
> > > It's not clearly spelled out in the new consumer api wiki, so I'll just
> > > ask, will this issue be addressed in the new api?  I think I've asked
> > this
> > > before, but I wanted to go check again, and am not seeing this
> explicitly
> > > addressed in the design.
> > >
> > > Thanks
> > >
> > > Jason
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>



-- 
-- Guozhang

Reply via email to