Hi Arti, This is nothing specific to KeyedProcessFunction, but the general way Flink distributes subtasks. The general idea is to use as few task managers as possible such that they are available for cluster downsizing or other concurrent jobs.
You can change this behavior through cluster.evenly-spread-out-slots configuration [1]. [1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/config.html#cluster-evenly-spread-out-slots On Mon, Sep 14, 2020 at 5:33 PM Arti Pande <pande.a...@gmail.com> wrote: > Hi, > > Here is a question related to parallelism of keyed-process-function that > is applied to the KeyedStream. For some code that looks like this > > myStream.keyBy(...) > .process(new MyKeyedProcessFunction()) > > .process(<someOtherProcessFunction>).setParallelism(10) > > > On a Flink cluster with 5 TM nodes each with 10 task slots, and Job > parallelism = 5, the 5 subtasks of MyKeyedProcessFunction() do not get > distributed across all the 5 TM nodes evenly. Those typically get assigned > to one single TM node. However the 50 subtasks of < > someOtherProcessFunction> are always spread evenly across the 5 TMs. > > Am I missing something? How can I get those MyKeyedProcessFunction() > subtasks distributed across all TM nodes evenly? > > Thanks > Arti > -- Arvid Heise | Senior Java Developer <https://www.ververica.com/> Follow us @VervericaData -- Join Flink Forward <https://flink-forward.org/> - The Apache Flink Conference Stream Processing | Event Driven | Real Time -- Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany -- Ververica GmbH Registered at Amtsgericht Charlottenburg: HRB 158244 B Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng