You can just use Math.abs(key.hashCode()) % numPartitions Regards, Lei
On Mon, Aug 19, 2024 at 5:41 PM Karthick <ibmkarthickma...@gmail.com> wrote: > Thanks Lei, will check it out. Please suggest me a Algorithm which > solves the problem if any. > > On Mon, Aug 19, 2024 at 2:17 PM Lei Wang <leiwang...@gmail.com> wrote: > >> Hi Karthick, >> Take a look at the distribution of your keys to see if there's some keys >> that contribute most of the data. >> If the distrubution is relatively uniform,try to use partitionCustomer >> with a self-defined partion function instead of keyBy. The default >> partition function in flink implements a consistent hash algorithm and will >> cause uneven distribution even the input is completely random( >> https://lists.apache.org/thread/9t1s0jbvzlkvlymom4sh39yw1qltszlz ). >> >> Regards, >> Lei >> >> On Sat, Aug 17, 2024 at 7:51 AM Karthick <ibmkarthickma...@gmail.com> >> wrote: >> >>> Hi Team, >>> >>> I'm using keyBy to maintain field-based ordering across tasks, but I'm >>> encountering data skewness among the task slots. I have 96 task slots, and >>> I'm sending data with 500 distinct keys used in keyBy. While reviewing the >>> Flink UI, I noticed that a few task slots are underutilized while others >>> are overutilized. >>> >>> This seems to be a hashing problem. Can anyone suggest a better hashing >>> technique or approach to resolve this issue? >>> >>> Thanks in advance for your help. >>> >>