Thanks Akash Jain for your detailed explanation. I have answered your
queries.

Yes as you said the key 1 and key 2 doing more produces. So the skew is
happening.

In which case you will have to get creative with your topic
> design


Can you please guide on how to do dynamic segregation of such keys
to separate topics or is there any design avail reg this. Please tell.

We need to process 500 messages per sec on the consumer side. But the
processing layer is doing ~5 messages per sec so we choose 96 (which is
closer to 100).


On Wed, Aug 21, 2024 at 5:37 PM Akash Jain <akashjain0...@gmail.com> wrote:

> Hi Karthick,
>
> The choice has to be yours depending on what you want to achieve. I
> understand you want to achieve even distribution of messages across your
> partitions. This depends on the following factors:
>
>    - The frequency of keys
>    - Hashing logic itself
>
> What you can control is the hashing logic - one of the ways could be
> hardcoding the keys and corresponding partition number in your logic (this
> is assuming that you have a small pool of distinct keys). This will
> definitively ensure that your algorithm is not 'biased' when returning the
> partition number. For example:
>
> key1 : partition 0
> key2 : partition 1
> key3 : partition 2
> key4 : partition 3
> key5 : partition 4
> key6 : partition 0
> .
> .
> .
>
> However, if your data contains a high number of specific keys, skewness
> cannot be entirely avoided. For example: if you have key1, key2 being
> produced most of the times, then you will observe partitions 0 and 1 to be
> loaded more than the other partitions.
>
> You need to identify the reason for skewness. Is it the hashing algorithm
> or frequency of keys itself that is causing skewness? If it is the
> frequency of keys, then there is not much that can be done with just one
> topic alone. In which case you will have to get creative with your topic
> design - for example you can have separate topics for certain high
> frequency keys!
>
> Moreover, first you should assess why you have 96 partitions. In my
> experience that is way too high.
>
> Thanks
>
> On Tue, Aug 20, 2024 at 4:36 PM Karthick <ibmkarthickma...@gmail.com>
> wrote:
>
> > Hi Akash Jain
> > Thanks for the reply seeking help for the same to choose hashing logics.
> > Please refer/suggest any.
> >
> > On Sat, Aug 17, 2024 at 10:21 AM Akash Jain <akashjain0...@gmail.com>
> > wrote:
> >
> > > Hi Karthick. You could implement your own custom partitioner.
> > >
> > > On Saturday, August 17, 2024, Karthick <ibmkarthickma...@gmail.com>
> > wrote:
> > >
> > > > Hi Team,
> > > >
> > > > I'm using Kafka partitioning to maintain field-based ordering across
> > > > partitions, but I'm experiencing data skewness among the partitions.
> I
> > > have
> > > > 96 partitions, and I'm sending data with 500 distinct keys that are
> > used
> > > > for partitioning. While monitoring the Kafka cluster, I noticed that
> a
> > > few
> > > > partitions are underutilized while others are overutilized.
> > > >
> > > > This seems to be a hashing problem. Can anyone suggest a better
> hashing
> > > > technique or partitioning strategy to balance the load more
> > effectively?
> > > >
> > > > Thanks in advance for your help.
> > > >
> > >
> >
>

Reply via email to