Re: Handling data skewness

Karthick Mon, 19 Aug 2024 02:41:31 -0700

Thanks Lei, will check it out. Please suggest me a Algorithm which
solves the problem if any.


On Mon, Aug 19, 2024 at 2:17 PM Lei Wang <leiwang...@gmail.com> wrote:

> Hi Karthick,
> Take a look at the distribution of your keys to see if there's some keys
> that contribute most of the data.
> If the distrubution is relatively uniform，try to use partitionCustomer
> with a self-defined partion function instead of keyBy. The default
> partition function in flink implements a consistent hash algorithm and will
> cause uneven distribution even the input is completely random(
> https://lists.apache.org/thread/9t1s0jbvzlkvlymom4sh39yw1qltszlz ).
>
> Regards,
> Lei
>
> On Sat, Aug 17, 2024 at 7:51 AM Karthick <ibmkarthickma...@gmail.com>
> wrote:
>
>> Hi Team,
>>
>> I'm using keyBy to maintain field-based ordering across tasks, but I'm
>> encountering data skewness among the task slots. I have 96 task slots, and
>> I'm sending data with 500 distinct keys used in keyBy. While reviewing the
>> Flink UI, I noticed that a few task slots are underutilized while others
>> are overutilized.
>>
>> This seems to be a hashing problem. Can anyone suggest a better hashing
>> technique or approach to resolve this issue?
>>
>> Thanks in advance for your help.
>>
>

Re: Handling data skewness

Reply via email to