Hello Senthil, In our case we use NULL as message Key to achieve even distribution in producer. With that we were able to achieve very even distribution with that. Our Kafka client version is 0.10.1.0 and Kafka broker version is 1.1
Thanks, Gaurav On Wed, Aug 29, 2018 at 9:15 AM, SenthilKumar K <senthilec...@gmail.com> wrote: > Hello Experts, We want to distribute data across partitions in Kafka > Cluster. > Option 1 : Use Null Partition Key which can distribute data across > paritions. > Option 2 : Choose Key ( Random UUID ? ) which can help to distribute data > 70-80%. > > I have seen below side effect on Confluence Page about sending null Keys to > Producer. Is this still valid on newer version of Kafka Producer Lib ? > Why is data not evenly distributed among partitions when a partitioning key > is not specified? > > In Kafka producer, a partition key can be specified to indicate the > destination partition of the message. By default, a hashing-based > partitioner is used to determine the partition id given the key, and people > can use customized partitioners also. > > To reduce # of open sockets, in 0.8.0 ( > https://issues.apache.org/jira/browse/KAFKA-1017), when the partitioning > key is not specified or null, a producer will pick a random partition and > stick to it for some time (default is 10 mins) before switching to another > one. So, if there are fewer producers than partitions, at a given point of > time, some partitions may not receive any data. To alleviate this problem, > one can either reduce the metadata refresh interval or specify a message > key and a customized random partitioner. For more detail see this thread > http://mail-archives.apache.org/mod_mbox/kafka-dev/201310. > mbox/%3CCAFbh0Q0aVh%2Bvqxfy7H-%2BMnRFBt6BnyoZk1LWBoMspwSmTqUK > Mg%40mail.gmail.com%3E > > Pls advise on Choosing Partition Key which should not have side effects. > > --Senthil >