Why can't we override the DefaultPartitioner, and simply override paritition() method, such that it will redistribute to all partitions in round robin fashion.
Round-Robin partitioner and StickyAssignor (consumer) should work nicely for any publish subscribe system. On Wed, 29 Aug 2018 at 09:39, SenthilKumar K <senthilec...@gmail.com> wrote: > Thanks Gaurav. Did you notice side effect mentioned in this page : > > https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyisdatanotevenlydistributedamongpartitionswhenapartitioningkeyisnotspecified > ? > > > --Senthil > > On Wed, Aug 29, 2018 at 2:02 PM Gaurav Bajaj <gauravhba...@gmail.com> > wrote: > > > Hello Senthil, > > > > In our case we use NULL as message Key to achieve even distribution in > > producer. > > With that we were able to achieve very even distribution with that. > > Our Kafka client version is 0.10.1.0 and Kafka broker version is 1.1 > > > > > > Thanks, > > Gaurav > > > > On Wed, Aug 29, 2018 at 9:15 AM, SenthilKumar K <senthilec...@gmail.com> > > wrote: > > > >> Hello Experts, We want to distribute data across partitions in Kafka > >> Cluster. > >> Option 1 : Use Null Partition Key which can distribute data across > >> paritions. > >> Option 2 : Choose Key ( Random UUID ? ) which can help to distribute > >> data > >> 70-80%. > >> > >> I have seen below side effect on Confluence Page about sending null Keys > >> to > >> Producer. Is this still valid on newer version of Kafka Producer Lib ? > >> Why is data not evenly distributed among partitions when a partitioning > >> key > >> is not specified? > >> > >> In Kafka producer, a partition key can be specified to indicate the > >> destination partition of the message. By default, a hashing-based > >> partitioner is used to determine the partition id given the key, and > >> people > >> can use customized partitioners also. > >> > >> To reduce # of open sockets, in 0.8.0 ( > >> https://issues.apache.org/jira/browse/KAFKA-1017), when the > partitioning > >> key is not specified or null, a producer will pick a random partition > and > >> stick to it for some time (default is 10 mins) before switching to > another > >> one. So, if there are fewer producers than partitions, at a given point > of > >> time, some partitions may not receive any data. To alleviate this > problem, > >> one can either reduce the metadata refresh interval or specify a message > >> key and a customized random partitioner. For more detail see this thread > >> > >> > http://mail-archives.apache.org/mod_mbox/kafka-dev/201310.mbox/%3CCAFbh0Q0aVh%2Bvqxfy7H-%2BMnRFBt6BnyoZk1LWBoMspwSmTqUKMg%40mail.gmail.com%3E > >> > >> Pls advise on Choosing Partition Key which should not have side effects. > >> > >> --Senthil > >> > > > > >