Hi haocheng, in short it works as follows:
- Each parallel instance of an operator is responsible for one to N key groups. - Each parallel instance belongs to a slot, which is tied with a single thread (slot may actually introduce multiple subtasks) - # of keygroups for each operator = max parallelism; keygroups are distributed between subtasks ("parallel instances of the operator") - We use hash partitioning to assign keys to KeyGroup For more details please refer to the Flink documentation [1]. You might also find Alex's YouTube series [2] or Fabian's and Vasiliki's book [3] helpful for learning about Flink architecture. [1] https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/concepts/stateful-stream-processing/ [2] https://www.ververica.com/blog/presenting-our-streaming-concepts-introduction-to-flink-video-series [3] https://www.amazon.com/Stream-Processing-Apache-Flink-Implementation/dp/149197429X Best, D. On Thu, Dec 2, 2021 at 1:20 PM haocheng <hihaoche...@icloud.com> wrote: > Hi community! > > So many keys are computed separately in each operator parallel, how does > Flink achieve this? Let an operator instance processes 100 keys, is there > 100 thread that are managed by the operator instance thread? > > I know that > 1. an operator many have many parallel instances at the same time, > 2. each of parallel instances is responsible to a “key group”, and each > key group is composed of many keys. > 3. Several operator parallel instances may come from the same TaskManager > and each of them is a thread.