Re: Handling data skewness

Lei Wang Mon, 19 Aug 2024 01:49:21 -0700

Hi Karthick,
Take a look at the distribution of your keys to see if there's some keys
that contribute most of the data.
If the distrubution is relatively uniform，try to use partitionCustomer with
a self-defined partion function instead of keyBy. The default partition
function in flink implements a consistent hash algorithm and will cause
uneven distribution even the input is completely random(
https://lists.apache.org/thread/9t1s0jbvzlkvlymom4sh39yw1qltszlz ).


Regards,
Lei

On Sat, Aug 17, 2024 at 7:51 AM Karthick <ibmkarthickma...@gmail.com> wrote:

> Hi Team,
>
> I'm using keyBy to maintain field-based ordering across tasks, but I'm
> encountering data skewness among the task slots. I have 96 task slots, and
> I'm sending data with 500 distinct keys used in keyBy. While reviewing the
> Flink UI, I noticed that a few task slots are underutilized while others
> are overutilized.
>
> This seems to be a hashing problem. Can anyone suggest a better hashing
> technique or approach to resolve this issue?
>
> Thanks in advance for your help.
>

Re: Handling data skewness

Reply via email to