Re: Uneven distribution of messages in topic's partitions

Nag Y Fri, 19 Jun 2020 20:48:15 -0700

Hi  Ricardo ,
Just follow up question to add , I believe the defaultpartioner uses
mumur3 as default .
     Should RoundRobinPartitioner class be used to  have an equal
distribution to maximum extent.instead of default partitioner ?
     Is StickyPartitioner (mentioned above) is different from
RoundRobinPartitioner and provides better distribution ?
     And, also I see  StickyPartitioner from KIP that it addresses the
improvements needed to reduce the latency.


Thanks,


On Fri, Jun 19, 2020 at 11:36 PM Ricardo Ferreira <rifer...@riferrei.com>
wrote:

> Hi Hemant,
>
> Being able to lookup specific records by key is not possible in Kafka.
> As a distributed streaming platform based on the concept of a commit log
> Kafka organizes data sequentially where each record has an offset that
> uniquely identifies not who the record is but where within the log it is
> positioned.
>
> In order to implement record lookup by key you would need to use Kafka
> Streams or ksqlDB. I would recommend ksqlDB since you can easily create
> a stream out of your existing topic and then make that stream
> transformed into a table. Note only that currently ksqlDB requires that
> each table that would serve pull requests (i.e.: queries that serve
> requests given a key) need to be created using an aggregation construct.
> So you might need to work that out in order to achieve the behavior that
> you want.
>
> Thanks,
>
> -- Ricardo
>
> On 6/19/20 1:07 PM, Hemant Bairwa wrote:
> > Thanks Ricardo.
> >
> > I need some information on more use case.
> > In my application I need to use Kafka to maintain the different
> > workflow states of message items while processing through different
> > processes. For example in my application all messages transits from
> > Process A to Process Z and I need to maintain all the processed states
> > by an item. So for item xyz there should be total 26 entries in Kafka
> > topic.
> > xyz, A
> > xyz, B... and so on.
> >
> > User should be able to retrieve all the messages for any specific key
> > as many times. That is a DB type of feature is required.
> >
> > 1. Is Kafka alone is able to cater this requirement?
> > 2. Or do I need to use KSql DB for meeting this requirement? I did
> > some research around it but I don't want to run separate KSql DB server.
> > 3. Any other suggestions?
> >
> > Regards,
> >
> >
> >
> > On Thu, 18 Jun 2020, 6:51 pm Ricardo Ferreira, <rifer...@riferrei.com
> > <mailto:rifer...@riferrei.com>> wrote:
> >
> >     Hemant,
> >
> >     This behavior might be the result of the version of AK (Apache
> >     Kafka) that you are using. Before AK 2.4 the default behavior for
> >     the DefaultPartitioner was to load balance data production across
> >     the partitions as you described. But it was found that this
> >     behavior would cause performance problems to the batching strategy
> >     that each producer does. Therefore, AK 2.4 introduced a new
> >     behavior into the DefaultPartitioner called sticky partitioning.
> >     You can follow up in this change reading up the KIP that was
> >     created for this change: *KIP-480
> >     <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
> >*.
> >
> >     The only downside that I see in your workaround is if you are
> >     handling connections to the partitions programmatically. That
> >     would make your code fragile because if the # of partitions for
> >     the topic changes then your code would not know this. Instead,
> >     just use the RoundRobinPartitioner
> >     <
> https://kafka.apache.org/25/javadoc/org/apache/kafka/clients/producer/RoundRobinPartitioner.html
> >
> >     explicitly in your producer:
> >
> >     ```
> >
> >     configs.put("partitioner.class",
> >     "org.apache.kafka.clients.producer.RoundRobinPartitioner");
> >
> >     ```
> >
> >     Thanks,
> >
> >     -- Ricardo
> >
> >     On 6/18/20 12:38 AM, Hemant Bairwa wrote:
> >>     Hello All
> >>
> >>     I have a single producer service which is queuing message into a
> topic with
> >>     let say 12 partitions. I want to evenly distribute the messages
> across all
> >>     the partitions in a round robin fashion.
> >>     Even after using default partitioning and keeping key 'NULL', the
> messages
> >>     are not getting distributed evenly. Rather some partitions are
> getting none
> >>     of the messages while some are getting multiple.
> >>     One reason I found for this behaviour, somewhere, is that if there
> are
> >>     lesser number of producers than the number of partitions, it
> distributes
> >>     the messages to fewer partitions to limit many open sockets.
> >>     However I have achieved even distribution through code by first
> getting
> >>     total partition numbers and then passing partition number in the
> >>     incremental order along with the message into the producer record.
> Once the
> >>     partition number reaches end of the partition number then again
> resetting
> >>     the next partition number to zero.
> >>
> >>     Query:
> >>     1. Is there can be any downside of above approach used?
> >>     2. If yes, how to achieve even distribution of messages in an
> optimized way?
> >>
>

Re: Uneven distribution of messages in topic's partitions

Reply via email to