I have a job using the keyBy function. The job parallelism is 40. My key is
based on a field in the records that has 2000+ possible values
My question is for the records for a given key, will they all be sent to
the one subtask or be distributed evenly amongst the all 40 downstream
operator sub tasks?
Put another way , are the partitions created by keyBy all assigned to a
single downstream subtask?

Reply via email to