Thanks Lei, will check it out. Please suggest me a Algorithm which solves the problem if any.
On Mon, Aug 19, 2024 at 2:17 PM Lei Wang <leiwang...@gmail.com> wrote: > Hi Karthick, > Take a look at the distribution of your keys to see if there's some keys > that contribute most of the data. > If the distrubution is relatively uniform,try to use partitionCustomer > with a self-defined partion function instead of keyBy. The default > partition function in flink implements a consistent hash algorithm and will > cause uneven distribution even the input is completely random( > https://lists.apache.org/thread/9t1s0jbvzlkvlymom4sh39yw1qltszlz ). > > Regards, > Lei > > On Sat, Aug 17, 2024 at 7:51 AM Karthick <ibmkarthickma...@gmail.com> > wrote: > >> Hi Team, >> >> I'm using keyBy to maintain field-based ordering across tasks, but I'm >> encountering data skewness among the task slots. I have 96 task slots, and >> I'm sending data with 500 distinct keys used in keyBy. While reviewing the >> Flink UI, I noticed that a few task slots are underutilized while others >> are overutilized. >> >> This seems to be a hashing problem. Can anyone suggest a better hashing >> technique or approach to resolve this issue? >> >> Thanks in advance for your help. >> >