Re: Uneven distribution of messages in topic's partitions

Hemant Bairwa Fri, 19 Jun 2020 10:08:15 -0700

Thanks Ricardo.

I need some information on more use case.
In my application I need to use Kafka to maintain the different workflow
states of message items while processing through different processes.
For example in my application all messages transits from Process A to
Process Z and I need to maintain all the processed states by an item. So
for item xyz there should be total 26 entries in Kafka topic.
xyz, A
xyz, B... and so on.


User should be able to retrieve all the messages for any specific key as
many times. That is a DB type of feature is required.

1. Is Kafka alone is able to cater this requirement?
2. Or do I need to use KSql DB for meeting this requirement? I did some
research around it but I don't want to run separate KSql DB server.
3. Any other suggestions?

Regards,



On Thu, 18 Jun 2020, 6:51 pm Ricardo Ferreira, <rifer...@riferrei.com>
wrote:

> Hemant,
>
> This behavior might be the result of the version of AK (Apache Kafka) that
> you are using. Before AK 2.4 the default behavior for the
> DefaultPartitioner was to load balance data production across the
> partitions as you described. But it was found that this behavior would
> cause performance problems to the batching strategy that each producer
> does. Therefore, AK 2.4 introduced a new behavior into the
> DefaultPartitioner called sticky partitioning. You can follow up in this
> change reading up the KIP that was created for this change: *KIP-480
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner>*
> .
>
> The only downside that I see in your workaround is if you are handling
> connections to the partitions programmatically. That would make your code
> fragile because if the # of partitions for the topic changes then your code
> would not know this. Instead, just use the RoundRobinPartitioner
> <https://kafka.apache.org/25/javadoc/org/apache/kafka/clients/producer/RoundRobinPartitioner.html>
> explicitly in your producer:
>
> ```
>
> configs.put("partitioner.class",
> "org.apache.kafka.clients.producer.RoundRobinPartitioner");
>
> ```
>
> Thanks,
>
> -- Ricardo
> On 6/18/20 12:38 AM, Hemant Bairwa wrote:
>
> Hello All
>
> I have a single producer service which is queuing message into a topic with
> let say 12 partitions. I want to evenly distribute the messages across all
> the partitions in a round robin fashion.
> Even after using default partitioning and keeping key 'NULL', the messages
> are not getting distributed evenly. Rather some partitions are getting none
> of the messages while some are getting multiple.
> One reason I found for this behaviour, somewhere, is that if there are
> lesser number of producers than the number of partitions, it distributes
> the messages to fewer partitions to limit many open sockets.
> However I have achieved even distribution through code by first getting
> total partition numbers and then passing partition number in the
> incremental order along with the message into the producer record. Once the
> partition number reaches end of the partition number then again resetting
> the next partition number to zero.
>
> Query:
> 1. Is there can be any downside of above approach used?
> 2. If yes, how to achieve even distribution of messages in an optimized way?
>
>
>

Re: Uneven distribution of messages in topic's partitions

Reply via email to