Hi David, keyBy() is implemented with hash partitioning. If you use the keyBy function, the records for a given key will be shuffled to a downstream operator subtask. See more in [1].
[1] https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/overview/#keyby Regards, Xiangyu David Corley <davidcor...@gmail.com> 于2023年8月3日周四 23:23写道: > I have a job using the keyBy function. The job parallelism is 40. My key > is based on a field in the records that has 2000+ possible values > My question is for the records for a given key, will they all be sent to > the one subtask or be distributed evenly amongst the all 40 downstream > operator sub tasks? > Put another way , are the partitions created by keyBy all assigned to a > single downstream subtask? >