Re: Uniform Distribution of Messages for Topic Across Partitions Without Effecting Performance

Jun Rao Thu, 07 Aug 2014 18:06:19 -0700

In the new producer, a client can specify the partition number for each
message. Then, any partitioning strategy can be implemented by the client.


Thanks,

Jun


On Thu, Aug 7, 2014 at 1:37 PM, Bhavesh Mistry <mistry.p.bhav...@gmail.com>
wrote:

> The root of problem is consumer lag on one or two partition even with no op
> ( read log and discard it) consumer .  Our use case is very simple.  Send
> all the log lines to Brokers.  But under storm of data (due to exception or
> application error etc), one or two partition gets lags behind while other
> consumer are at 0 lag.  We have tune the GC using the recommended GC
> setting (according to
> http://www.slideshare.net/ToddPalino/enterprise-kafka-kafka-as-a-service
> tuning section )   In normal situation, this is ok.
>
> Hashing based on a key, and sticking to Murmur hash(key) % number of
> partition did not give did not give a better throughput as compare to
> random partitioning.   It would be good to build intelligence about
> producer selection based on rate of data for topic and/or lag.   Is there
> any way to customize stickiness interval for random partitioning strategy
>  ?
>
> sorry for late response.
>
> Thanks,
>
> Bhavesh
>
>
> On Mon, Aug 4, 2014 at 6:50 PM, Joe Stein <joe.st...@stealth.ly> wrote:
>
> > Bhavesh, take a look at
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyisdatanotevenlydistributedamongpartitionswhenapartitioningkeyisnotspecified
> > ?
> >
> > Maybe the root cause issue is something else? Even if producers produce
> > more or less than what they are producing you should be able to make it
> > random enough with a partitioner and a key.  I don't think you should
> need
> > more than what is in the FAQ but incase so maybe look into
> > http://en.wikipedia.org/wiki/MurmurHash as another hash option.
> >
> > /*******************************************
> >  Joe Stein
> >  Founder, Principal Consultant
> >  Big Data Open Source Security LLC
> >  http://www.stealth.ly
> >  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > ********************************************/
> >
> >
> > On Mon, Aug 4, 2014 at 9:12 PM, Bhavesh Mistry <
> mistry.p.bhav...@gmail.com
> > >
> > wrote:
> >
> > > How to achieve uniform distribution of non-keyed messages per topic
> > across
> > > all partitions?
> > >
> > > We have tried to do this uniform distribution across partition using
> > custom
> > > partitioning from each producer instance using round robing (
> > > count(messages) % number of partition for topic). This strategy results
> > in
> > > very poor performance.  So we have switched back to random stickiness
> > that
> > > Kafka provide out of box per some interval ( 10 minutes not sure
> exactly
> > )
> > > per topic.
> > >
> > > The above strategy results in consumer side lags sometime for some
> > > partitions because we have some applications/producers  producing more
> > > messages for same topic than other servers.
> > >
> > > Can Kafka provide out of box uniform distribution by using coordination
> > > among all producers and rely on measure rate such as  # messages per
> > minute
> > > or # of bytes produce per minute to achieve uniform distribution and
> > > coordinate stickiness of partition among hundreds of producers for same
> > > topic ?
> > >
> > > Thanks,
> > >
> > > Bhavesh
> > >
> >
>

Re: Uniform Distribution of Messages for Topic Across Partitions Without Effecting Performance

Reply via email to