Will all records grouped using keyBy be allocated to a single subtask?

David Corley Thu, 03 Aug 2023 08:23:09 -0700

I have a job using the keyBy function. The job parallelism is 40. My key is
based on a field in the records that has 2000+ possible values
My question is for the records for a given key, will they all be sent to
the one subtask or be distributed evenly amongst the all 40 downstream
operator sub tasks?
Put another way , are the partitions created by keyBy all assigned to a
single downstream subtask?

Will all records grouped using keyBy be allocated to a single subtask?

Reply via email to