Guozhang,

I'm not suggesting we parallelize within a partition....

The problem with the current high-level consumer is, if you use a regex to
select multiple topics, and then have multiple consumers in the same group,
usually the first consumer will 'own' all the topics, and no amount of
sub-sequent rebalancing will allow other consumers in the group to own some
of the topics.  Re-balancing does allow other consumers to own multiple
partitions, but if a topic has only 1 partition, only the first consumer to
initialize will get all the work.

So, I'm wondering if the new api will be better about re-balancing the work
at the partition level, and not the topic level, as such.

Jason


On Mon, Jul 7, 2014 at 11:17 AM, Guozhang Wang <wangg...@gmail.com> wrote:

> Hi Jason,
>
> In the new design the consumption is still at the per-partition
> granularity. The main rationale of doing this is ordering: Within a
> partition we want to preserve the ordering such that message B produced
> after message A will also be consumed and processed after message A. And
> producers can use keys to make sure messages with the same ordering group
> will be in the same partition. To do this we have to make one partition
> only being consumed by a single client at a time. On the other hand, when
> one wants to add the number of consumers beyond the number of partitions,
> he can always use the topic tool to dynamically add more partitions to the
> topic.
>
> Do you have a specific scenario in mind that would require single-partition
> topics?
>
> Guozhang
>
>
>
> On Mon, Jul 7, 2014 at 7:43 AM, Jason Rosenberg <j...@squareup.com> wrote:
>
> > I've been looking at the new consumer api outlined here:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+Design
> >
> > One issue in the current high-level consumer, is that it does not do a
> good
> > job of distributing a set of topics between multiple consumers, unless
> each
> > topic has multiple partitions.  This has always seemed strange to me,
> since
> > at the end of the day, even for single partition topics, the basic unit
> of
> > consumption is still at the partition level (so you'd expect rebalancing
> to
> > try to evenly distribute partitions (regardless of the topic)).
> >
> > It's not clearly spelled out in the new consumer api wiki, so I'll just
> > ask, will this issue be addressed in the new api?  I think I've asked
> this
> > before, but I wanted to go check again, and am not seeing this explicitly
> > addressed in the design.
> >
> > Thanks
> >
> > Jason
> >
>
>
>
> --
> -- Guozhang
>

Reply via email to