Yeah. The actual bottleneck is actually number of topics that match the
topic filter. Num of streams is going be shared between all topics it's
consuming from. I thought about following ideas to work around this. (I am
basically referring to mirrormaker consumer in examples).

Option 1). Instead of running one mirrormaker process with topic filter
".+", We can start multiple mirrormaker process with topic filter matching
each topic (Eg: mirrormaker1 => whitelist topic1.* , mirrormaker2
=> whitelist topic2.* etc)

But this adds some operations overhead to start and manage multiple
processes on the host.

Option 2) Modify mirrormaker code to support list of whitelist filters and
it should create message streams for  each filter
(call createMessageStreamsByFilter for each filter).

What would be your recommendation..? If adding feature to mirrormaker is
worth kafka, we can do option 2.

Thanks,
Raja.




On Fri, Aug 30, 2013 at 10:34 AM, Jun Rao <jun...@gmail.com> wrote:

> Right, but if you set #partitions in each topic to 16, you can use a total
> of 16 streams.
>
> Thanks,
>
> Jun
>
>
> On Thu, Aug 29, 2013 at 9:08 PM, Rajasekar Elango <rela...@salesforce.com
> >wrote:
>
> > With option 1) I can't really use 8 streams in each consumer, If I do
> only
> > one consumer seem to be doing all work. So I had to actually use total 8
> > streams with 4 for each consumer.
> >
> >
> >
> > On Fri, Aug 30, 2013 at 12:01 AM, Jun Rao <jun...@gmail.com> wrote:
> >
> > > The drawback of 2), as you said is no auto failover. I was suggesting
> > that
> > > you use 16 partitions. Then you can use option 1) with 8 streams in
> each
> > > consumer.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Thu, Aug 29, 2013 at 8:51 PM, Rajasekar Elango <
> > rela...@salesforce.com
> > > >wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > If you read my previous posts, based on current re balancing logic,
> if
> > we
> > > > consumer from topic filter, consumer actively use all streams. Can
> you
> > > > provide your recommendation of option 1 vs option 2 in my previous
> > post?
> > > >
> > > > Thanks,
> > > > Raja.
> > > >
> > > >
> > > > On Thu, Aug 29, 2013 at 11:42 PM, Jun Rao <jun...@gmail.com> wrote:
> > > >
> > > > > You can always use more partitions to get more parallelism in the
> > > > > consumers.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > >
> > > > > On Thu, Aug 29, 2013 at 12:44 PM, Rajasekar Elango
> > > > > <rela...@salesforce.com>wrote:
> > > > >
> > > > > > So what is best way to load balance multiple consumers consuming
> > from
> > > > > topic
> > > > > > filter.
> > > > > >
> > > > > > Let's say we have 4 topics with 8 partitions and 2 consumers.
> > > > > >
> > > > > > Option 1) To load balance consumers, we can set num.streams=4 so
> > that
> > > > > both
> > > > > > consumers split 8 partitions. but can only use half of consumer
> > > > streams.
> > > > > >
> > > > > > Option 2) Configure mutually exclusive topic filter regex such
> > that 2
> > > > > > topics will match consumer1 and 2 topics will match consumer2.
> Now
> > we
> > > > can
> > > > > > set num.streams=8 and fully utilize consumer streams. I believe
> > this
> > > > will
> > > > > > improve performance, but if consumer dies, we will not get any
> data
> > > > from
> > > > > > the topic used by that consumer.
> > > > > >
> > > > > > What would be your recommendation?
> > > > > >
> > > > > > Thanks,
> > > > > > Raja.
> > > > > >
> > > > > >
> > > > > > On Thu, Aug 29, 2013 at 12:42 PM, Neha Narkhede <
> > > > neha.narkh...@gmail.com
> > > > > > >wrote:
> > > > > >
> > > > > > > >> 2) When I started mirrormaker with num.streams=16, looks
> like
> > 16
> > > > > > > consumer
> > > > > > > threads were created, but only 8 are showing up as active as
> > owner
> > > in
> > > > > > > consumer offset tracker and all topics/partitions are
> distributed
> > > > > > between 8
> > > > > > > consumer threads.
> > > > > > >
> > > > > > > This is because currently the consumer rebalancing process of
> > > > assigning
> > > > > > > partitions to consumer streams is at a per topic level. Unless
> > you
> > > > have
> > > > > > at
> > > > > > > least one topic with 16 partitions, the remaining 8 threads
> will
> > > not
> > > > do
> > > > > > any
> > > > > > > work. This is not ideal and we want to look into a better
> > > rebalancing
> > > > > > > algorithm. Though it is a big change and we prefer doing it as
> > part
> > > > of
> > > > > > the
> > > > > > > consumer client rewrite.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Neha
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Aug 29, 2013 at 8:03 AM, Rajasekar Elango <
> > > > > > rela...@salesforce.com
> > > > > > > >wrote:
> > > > > > >
> > > > > > > > So my understanding is num of active streams that a consumer
> > can
> > > > > > utilize
> > > > > > > is
> > > > > > > > number of partitions in topic. This is fine if we consumer
> from
> > > > > > specific
> > > > > > > > topic. But if we consumer from TopicFilter, I thought
> consumer
> > > > should
> > > > > > > able
> > > > > > > > to utilize (number of topics that match filter * number of
> > > > partitions
> > > > > > in
> > > > > > > > topic) . But looks like number of streams that consumer can
> use
> > > is
> > > > > > > limited
> > > > > > > > by just number if partitions in topic although it's consuming
> > > from
> > > > > > > multiple
> > > > > > > > topic.
> > > > > > > >
> > > > > > > > Here what I observed with 1 mirrormaker consuming from
> > whitelist
> > > > > '.+'.
> > > > > > > >
> > > > > > > > The white list matches 5 topics and each topic has 8
> > partitions.
> > > I
> > > > > used
> > > > > > > > consumer offset checker to look at owner of each/topic
> > partition.
> > > > > > > >
> > > > > > > > 1) When I started mirrormaker with num.streams=8, all
> > > > > topics/partitions
> > > > > > > are
> > > > > > > > distributed between 8 consumer threads.
> > > > > > > >
> > > > > > > > 2) When I started mirrormaker with num.streams=16, looks like
> > 16
> > > > > > consumer
> > > > > > > > threads were created, but only 8 are showing up as active as
> > > owner
> > > > in
> > > > > > > > consumer offset tracker and all topics/partitions are
> > distributed
> > > > > > > between 8
> > > > > > > > consumer threads.
> > > > > > > >
> > > > > > > > So this could be bottleneck for consumers as although we
> > > > partitioned
> > > > > > > topic,
> > > > > > > > if we are consuming from topic filter it can't utilize much
> of
> > > > > > > parallelism
> > > > > > > > with num of streams. Am i missing something, is there a way
> to
> > > make
> > > > > > > > cosumers/mirrormakers to utilize more number of active
> streams?
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Thanks,
> > > > > > > > Raja.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Thanks,
> > > > > > Raja.
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Thanks,
> > > > Raja.
> > > >
> > >
> >
> >
> >
> > --
> > Thanks,
> > Raja.
> >
>



-- 
Thanks,
Raja.

Reply via email to