Hi haocheng,

in short it works as follows:

- Each parallel instance of an operator is responsible for one to N key
groups.
- Each parallel instance belongs to a slot, which is tied with a single
thread (slot may actually introduce multiple subtasks)
- # of keygroups for each operator = max parallelism; keygroups are
distributed between subtasks ("parallel instances of the operator")
- We use hash partitioning to assign keys to KeyGroup

For more details please refer to the Flink documentation [1]. You might
also find Alex's YouTube series [2] or Fabian's and Vasiliki's book [3]
helpful for learning about Flink architecture.

[1]
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/concepts/stateful-stream-processing/
[2]
https://www.ververica.com/blog/presenting-our-streaming-concepts-introduction-to-flink-video-series
[3]
https://www.amazon.com/Stream-Processing-Apache-Flink-Implementation/dp/149197429X

Best,
D.


On Thu, Dec 2, 2021 at 1:20 PM haocheng <hihaoche...@icloud.com> wrote:

> Hi community!
>
> So many keys are computed separately in each operator parallel, how does
> Flink achieve this? Let an operator instance processes 100 keys, is there
> 100 thread that are managed by the operator instance thread?
>
> I know that
> 1. an operator many have many parallel instances at the same time,
> 2. each of parallel instances is responsible to a “key group”, and each
> key group is composed of many keys.
> 3. Several operator parallel instances may come from the same TaskManager
> and each of them is a thread.

Reply via email to