Hi Vadim, Could you please check if there are records with identical keys or hash code of keys. The keyby redistribution relies on an even distribution of hash codes. If there are identical hash codes, there probably be a data skew.
Best, Zakelly On Tue, Mar 11, 2025 at 8:11 PM Vararu, Vadim via user < user@flink.apache.org> wrote: > Hello, > > > > I’ve got two tasks: > > - one reading from the source (parallelism 1) > - second, a keyed function (parallelism 50) > > > > Having the max parallelism set to 1500 and the parallelism of 50, I expect > the second task to have incoming data equally spread when distributing the > keys to the key groups (1500 key groups / 50 parallelism = 30 keys/group). > > > > However, looking in the UI stats I see a big data skew (variates between > 75 and 250 records received per TM). > > > > What could cause skew after keyBy, even if the maxParallelism / > parallelism gives an even number? > > > > Thanks, > > Vadim. > > >