Thanks Ricardo. I need some information on more use case. In my application I need to use Kafka to maintain the different workflow states of message items while processing through different processes. For example in my application all messages transits from Process A to Process Z and I need to maintain all the processed states by an item. So for item xyz there should be total 26 entries in Kafka topic. xyz, A xyz, B... and so on.
User should be able to retrieve all the messages for any specific key as many times. That is a DB type of feature is required. 1. Is Kafka alone is able to cater this requirement? 2. Or do I need to use KSql DB for meeting this requirement? I did some research around it but I don't want to run separate KSql DB server. 3. Any other suggestions? Regards, On Thu, 18 Jun 2020, 6:51 pm Ricardo Ferreira, <rifer...@riferrei.com> wrote: > Hemant, > > This behavior might be the result of the version of AK (Apache Kafka) that > you are using. Before AK 2.4 the default behavior for the > DefaultPartitioner was to load balance data production across the > partitions as you described. But it was found that this behavior would > cause performance problems to the batching strategy that each producer > does. Therefore, AK 2.4 introduced a new behavior into the > DefaultPartitioner called sticky partitioning. You can follow up in this > change reading up the KIP that was created for this change: *KIP-480 > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner>* > . > > The only downside that I see in your workaround is if you are handling > connections to the partitions programmatically. That would make your code > fragile because if the # of partitions for the topic changes then your code > would not know this. Instead, just use the RoundRobinPartitioner > <https://kafka.apache.org/25/javadoc/org/apache/kafka/clients/producer/RoundRobinPartitioner.html> > explicitly in your producer: > > ``` > > configs.put("partitioner.class", > "org.apache.kafka.clients.producer.RoundRobinPartitioner"); > > ``` > > Thanks, > > -- Ricardo > On 6/18/20 12:38 AM, Hemant Bairwa wrote: > > Hello All > > I have a single producer service which is queuing message into a topic with > let say 12 partitions. I want to evenly distribute the messages across all > the partitions in a round robin fashion. > Even after using default partitioning and keeping key 'NULL', the messages > are not getting distributed evenly. Rather some partitions are getting none > of the messages while some are getting multiple. > One reason I found for this behaviour, somewhere, is that if there are > lesser number of producers than the number of partitions, it distributes > the messages to fewer partitions to limit many open sockets. > However I have achieved even distribution through code by first getting > total partition numbers and then passing partition number in the > incremental order along with the message into the producer record. Once the > partition number reaches end of the partition number then again resetting > the next partition number to zero. > > Query: > 1. Is there can be any downside of above approach used? > 2. If yes, how to achieve even distribution of messages in an optimized way? > > >