Hi David,

keyBy() is implemented with hash partitioning. If you use the keyBy
function, the records for a given key will be shuffled to a downstream
operator subtask. See more in [1].

[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/overview/#keyby

Regards,
Xiangyu

David Corley <davidcor...@gmail.com> 于2023年8月3日周四 23:23写道:

> I have a job using the keyBy function. The job parallelism is 40. My key
> is based on a field in the records that has 2000+ possible values
> My question is for the records for a given key, will they all be sent to
> the one subtask or be distributed evenly amongst the all 40 downstream
> operator sub tasks?
> Put another way , are the partitions created by keyBy all assigned to a
> single downstream subtask?
>

Reply via email to