Hi Karthick, Take a look at the distribution of your keys to see if there's some keys that contribute most of the data. If the distrubution is relatively uniform,try to use partitionCustomer with a self-defined partion function instead of keyBy. The default partition function in flink implements a consistent hash algorithm and will cause uneven distribution even the input is completely random( https://lists.apache.org/thread/9t1s0jbvzlkvlymom4sh39yw1qltszlz ).
Regards, Lei On Sat, Aug 17, 2024 at 7:51 AM Karthick <ibmkarthickma...@gmail.com> wrote: > Hi Team, > > I'm using keyBy to maintain field-based ordering across tasks, but I'm > encountering data skewness among the task slots. I have 96 task slots, and > I'm sending data with 500 distinct keys used in keyBy. While reviewing the > Flink UI, I noticed that a few task slots are underutilized while others > are overutilized. > > This seems to be a hashing problem. Can anyone suggest a better hashing > technique or approach to resolve this issue? > > Thanks in advance for your help. >