I have a job using the keyBy function. The job parallelism is 40. My key is based on a field in the records that has 2000+ possible values My question is for the records for a given key, will they all be sent to the one subtask or be distributed evenly amongst the all 40 downstream operator sub tasks? Put another way , are the partitions created by keyBy all assigned to a single downstream subtask?
- Will all records grouped using keyBy be allocated to a single... David Corley