Re: Uneven distribution of messages in topic's partitions

Ricardo Ferreira Thu, 18 Jun 2020 06:22:06 -0700

Hemant,

This behavior might be the result of the version of AK (Apache Kafka)that you are using. Before AK 2.4 the default behavior for theDefaultPartitioner was to load balance data production across thepartitions as you described. But it was found that this behavior wouldcause performance problems to the batching strategy that each producerdoes. Therefore, AK 2.4 introduced a new behavior into theDefaultPartitioner called sticky partitioning. You can follow up in thischange reading up the KIP that was created for this change: *KIP-480<https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner>*.

The only downside that I see in your workaround is if you are handlingconnections to the partitions programmatically. That would make yourcode fragile because if the # of partitions for the topic changes thenyour code would not know this. Instead, just use theRoundRobinPartitioner<https://kafka.apache.org/25/javadoc/org/apache/kafka/clients/producer/RoundRobinPartitioner.html>explicitly in your producer:

```

configs.put("partitioner.class","org.apache.kafka.clients.producer.RoundRobinPartitioner");


```

Thanks,

-- Ricardo

On 6/18/20 12:38 AM, Hemant Bairwa wrote:

Hello All

I have a single producer service which is queuing message into a topic with
let say 12 partitions. I want to evenly distribute the messages across all
the partitions in a round robin fashion.
Even after using default partitioning and keeping key 'NULL', the messages
are not getting distributed evenly. Rather some partitions are getting none
of the messages while some are getting multiple.
One reason I found for this behaviour, somewhere, is that if there are
lesser number of producers than the number of partitions, it distributes
the messages to fewer partitions to limit many open sockets.
However I have achieved even distribution through code by first getting
total partition numbers and then passing partition number in the
incremental order along with the message into the producer record. Once the
partition number reaches end of the partition number then again resetting
the next partition number to zero.

Query:
1. Is there can be any downside of above approach used?
2. If yes, how to achieve even distribution of messages in an optimized way?

Re: Uneven distribution of messages in topic's partitions

Reply via email to