Re: Kafka Producer Partition Key Selection

M. Manna Wed, 29 Aug 2018 06:35:30 -0700

Why can't we override the DefaultPartitioner, and simply override
paritition()  method, such that it will redistribute to all partitions in
round robin fashion.


Round-Robin partitioner and StickyAssignor (consumer) should work nicely
for any publish subscribe system.

On Wed, 29 Aug 2018 at 09:39, SenthilKumar K <senthilec...@gmail.com> wrote:

> Thanks Gaurav.  Did you notice side effect mentioned in this page :
>
> https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyisdatanotevenlydistributedamongpartitionswhenapartitioningkeyisnotspecified
> ?
>
>
> --Senthil
>
> On Wed, Aug 29, 2018 at 2:02 PM Gaurav Bajaj <gauravhba...@gmail.com>
> wrote:
>
> > Hello Senthil,
> >
> > In our case we use NULL as message Key to achieve even distribution in
> > producer.
> > With that we were able to achieve very even distribution with that.
> > Our Kafka client version is 0.10.1.0 and Kafka broker version is 1.1
> >
> >
> > Thanks,
> > Gaurav
> >
> > On Wed, Aug 29, 2018 at 9:15 AM, SenthilKumar K <senthilec...@gmail.com>
> > wrote:
> >
> >> Hello Experts, We want to distribute data across partitions in Kafka
> >> Cluster.
> >>  Option 1 : Use Null Partition Key which can distribute data across
> >> paritions.
> >>  Option 2 :  Choose Key ( Random UUID ? ) which can help to distribute
> >> data
> >> 70-80%.
> >>
> >> I have seen below side effect on Confluence Page about sending null Keys
> >> to
> >> Producer. Is this still valid on newer version of Kafka Producer Lib ?
> >> Why is data not evenly distributed among partitions when a partitioning
> >> key
> >> is not specified?
> >>
> >> In Kafka producer, a partition key can be specified to indicate the
> >> destination partition of the message. By default, a hashing-based
> >> partitioner is used to determine the partition id given the key, and
> >> people
> >> can use customized partitioners also.
> >>
> >> To reduce # of open sockets, in 0.8.0 (
> >> https://issues.apache.org/jira/browse/KAFKA-1017), when the
> partitioning
> >> key is not specified or null, a producer will pick a random partition
> and
> >> stick to it for some time (default is 10 mins) before switching to
> another
> >> one. So, if there are fewer producers than partitions, at a given point
> of
> >> time, some partitions may not receive any data. To alleviate this
> problem,
> >> one can either reduce the metadata refresh interval or specify a message
> >> key and a customized random partitioner. For more detail see this thread
> >>
> >>
> http://mail-archives.apache.org/mod_mbox/kafka-dev/201310.mbox/%3CCAFbh0Q0aVh%2Bvqxfy7H-%2BMnRFBt6BnyoZk1LWBoMspwSmTqUKMg%40mail.gmail.com%3E
> >>
> >> Pls advise on Choosing Partition Key which should not have side effects.
> >>
> >> --Senthil
> >>
> >
> >
>

Re: Kafka Producer Partition Key Selection

Reply via email to