Re: Will all records grouped using keyBy be allocated to a single subtask?

liu ron Sat, 05 Aug 2023 02:29:06 -0700

Hi, David.

Yes, all records with the same key will be shuffled to a single downstream
subtask. Otherwise, the computed results will be wrong.


Best,
Ron

xiangyu feng <xiangyu...@gmail.com> 于2023年8月4日周五 09:45写道：

> Hi David,
>
> keyBy() is implemented with hash partitioning. If you use the keyBy
> function, the records for a given key will be shuffled to a downstream
> operator subtask. See more in [1].
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/overview/#keyby
>
> Regards,
> Xiangyu
>
> David Corley <davidcor...@gmail.com> 于2023年8月3日周四 23:23写道：
>
>> I have a job using the keyBy function. The job parallelism is 40. My key
>> is based on a field in the records that has 2000+ possible values
>> My question is for the records for a given key, will they all be sent to
>> the one subtask or be distributed evenly amongst the all 40 downstream
>> operator sub tasks?
>> Put another way , are the partitions created by keyBy all assigned to a
>> single downstream subtask?
>>
>

Re: Will all records grouped using keyBy be allocated to a single subtask?

Reply via email to