Data skew after keyBy (even with a good number of key groups)

Vararu, Vadim via user Tue, 11 Mar 2025 05:11:08 -0700

Hello,

I’ve got two tasks:


  *   one reading from the source (parallelism 1)
  *   second, a keyed function (parallelism 50)

Having the max parallelism set to 1500 and the parallelism of 50, I expect the 
second task to have incoming data equally spread when distributing the keys to 
the key groups (1500 key groups / 50 parallelism = 30 keys/group).

However, looking in the UI stats I see a big data skew (variates between 75 and 
250 records received per TM).

What could cause skew after keyBy, even if the maxParallelism / parallelism 
gives an even number?

Thanks,
Vadim.

Data skew after keyBy (even with a good number of key groups)

Reply via email to