Hi Arti,

This is nothing specific to KeyedProcessFunction, but the general way Flink
distributes subtasks. The general idea is to use as few task managers as
possible such that they are available for cluster downsizing or other
concurrent jobs.

You can change this behavior through cluster.evenly-spread-out-slots
configuration [1].

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/config.html#cluster-evenly-spread-out-slots



On Mon, Sep 14, 2020 at 5:33 PM Arti Pande <pande.a...@gmail.com> wrote:

> Hi,
>
> Here is a question related to parallelism of keyed-process-function that
> is applied to the KeyedStream. For some code that looks like this
>
> myStream.keyBy(...)
>     .process(new MyKeyedProcessFunction())
>
>     .process(<someOtherProcessFunction>).setParallelism(10)
>
>
> On a Flink cluster with 5 TM nodes each with 10 task slots, and Job
> parallelism = 5, the 5 subtasks of MyKeyedProcessFunction() do not get
> distributed across all the 5 TM nodes evenly. Those typically get assigned
> to one single TM node. However the 50 subtasks of <
> someOtherProcessFunction> are always spread evenly across the 5 TMs.
>
> Am I missing something? How can I get those MyKeyedProcessFunction()
> subtasks distributed across all TM nodes evenly?
>
> Thanks
> Arti
>


-- 

Arvid Heise | Senior Java Developer

<https://www.ververica.com/>

Follow us @VervericaData

--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Toni) Cheng

Reply via email to