Re: Handling data skewness

2024-08-19 Thread Lei Wang
You can just use Math.abs(key.hashCode()) % numPartitions Regards, Lei On Mon, Aug 19, 2024 at 5:41 PM Karthick wrote: > Thanks Lei, will check it out. Please suggest me a Algorithm which > solves the problem if any. > > On Mon, Aug 19, 2024 at 2:17 PM Lei Wang wrote: > >> Hi Karthick, >> Tak

Re: Handling data skewness

2024-08-19 Thread Karthick
Thanks Lei, will check it out. Please suggest me a Algorithm which solves the problem if any. On Mon, Aug 19, 2024 at 2:17 PM Lei Wang wrote: > Hi Karthick, > Take a look at the distribution of your keys to see if there's some keys > that contribute most of the data. > If the distrubution is rel

Re: Handling data skewness

2024-08-19 Thread Lei Wang
Hi Karthick, Take a look at the distribution of your keys to see if there's some keys that contribute most of the data. If the distrubution is relatively uniform,try to use partitionCustomer with a self-defined partion function instead of keyBy. The default partition function in flink implements a

Handling data skewness

2024-08-16 Thread Karthick
Hi Team, I'm using keyBy to maintain field-based ordering across tasks, but I'm encountering data skewness among the task slots. I have 96 task slots, and I'm sending data with 500 distinct keys used in keyBy. While reviewing the Flink UI, I noticed that a few task slots are underutilized while ot