Re: Handling data skewness

Lei Wang Mon, 19 Aug 2024 03:19:53 -0700

You can just use Math.abs(key.hashCode())  % numPartitions

Regards,
Lei


On Mon, Aug 19, 2024 at 5:41 PM Karthick <ibmkarthickma...@gmail.com> wrote:

> Thanks Lei, will check it out. Please suggest me a Algorithm which
> solves the problem if any.
>
> On Mon, Aug 19, 2024 at 2:17 PM Lei Wang <leiwang...@gmail.com> wrote:
>
>> Hi Karthick,
>> Take a look at the distribution of your keys to see if there's some keys
>> that contribute most of the data.
>> If the distrubution is relatively uniform，try to use partitionCustomer
>> with a self-defined partion function instead of keyBy. The default
>> partition function in flink implements a consistent hash algorithm and will
>> cause uneven distribution even the input is completely random(
>> https://lists.apache.org/thread/9t1s0jbvzlkvlymom4sh39yw1qltszlz ).
>>
>> Regards,
>> Lei
>>
>> On Sat, Aug 17, 2024 at 7:51 AM Karthick <ibmkarthickma...@gmail.com>
>> wrote:
>>
>>> Hi Team,
>>>
>>> I'm using keyBy to maintain field-based ordering across tasks, but I'm
>>> encountering data skewness among the task slots. I have 96 task slots, and
>>> I'm sending data with 500 distinct keys used in keyBy. While reviewing the
>>> Flink UI, I noticed that a few task slots are underutilized while others
>>> are overutilized.
>>>
>>> This seems to be a hashing problem. Can anyone suggest a better hashing
>>> technique or approach to resolve this issue?
>>>
>>> Thanks in advance for your help.
>>>
>>

Re: Handling data skewness

Reply via email to