Hi, David. Yes, all records with the same key will be shuffled to a single downstream subtask. Otherwise, the computed results will be wrong.
Best, Ron xiangyu feng <xiangyu...@gmail.com> 于2023年8月4日周五 09:45写道: > Hi David, > > keyBy() is implemented with hash partitioning. If you use the keyBy > function, the records for a given key will be shuffled to a downstream > operator subtask. See more in [1]. > > [1] > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/overview/#keyby > > Regards, > Xiangyu > > David Corley <davidcor...@gmail.com> 于2023年8月3日周四 23:23写道: > >> I have a job using the keyBy function. The job parallelism is 40. My key >> is based on a field in the records that has 2000+ possible values >> My question is for the records for a given key, will they all be sent to >> the one subtask or be distributed evenly amongst the all 40 downstream >> operator sub tasks? >> Put another way , are the partitions created by keyBy all assigned to a >> single downstream subtask? >> >