Hello Alexandre,

Flink does not use TaskSlot per each task by default, but rather task slot
will hold a slice of the entire pipeline (up to 1 subtasks of each
operator, depending on the operator parallelism) [1].
So if your job parallelism is 1 - only a single task slot will be occupied.

If you want to modify this behavior and distribute operators between slots
- you can take a look at slot sharing groups [2].

1 -
https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/concepts/flink-architecture/#task-slots-and-resources
2 -
https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/datastream/operators/overview/#set-slot-sharing-group

Kind regards,
Aleksandr

On Fri, 12 Jul 2024 at 17:34, Alexandre KY <alexandre...@magellium.fr>
wrote:

> Hello,
>
>
> I am trying to run a pipeline made of 3 tasks (have a look at
> flink_job.png):
>
>    - Task 1: Source, FlatMap, Map, keyBy
>    - Task 2: Window, Map, keyBy
>    - Task 3: FlatMap, Map, Sink
>
> From what I have read, in streaming mode, all the tasks run
> simultaneously. Therefore, each task take one TaskSlot like here
> <https://medium.com/@LIME0040/flink-trouble-shooting-techniques-i-resource-management-73c61fcc8022>.
> However, as you can see in the picture flink_tm (I run the job on a cluster
> made of 1 jobmanager and 1 taskmanager), the taskmanager has 3 slots, but
> only 1 of them is being used even though the 3 tasks are running. The
> first task is still creating more data (supposed to produce 25 outputs) to
> send to the 2nd one and even when the 3rd task receives data, the number of
> taskslots used remain 1.
>
>
> I don't understand why Flink doesn't use all the taskslots which leads it
> to behave similarly to batch mode: it tends to produce all the outputs of
> Task 1, then produces only the outputs of Task 2 and crashes because my
> computer is out of memory since it keeps accumulates the outputs of Task 2
> in memory before sending them to Task 3 despite setting `
> env.set_runtime_mode(RuntimeExecutionMode.STREAMING)`
>
> I said "tends" because while Task 2 is processing the 25 products, Task 3
> received 2 of them and produced 2 outputs, but after that it stopped (the
> number of records received remained 2) and Flink only runs Task 2 (I see it
> in the logs) until the memory explodes.
>
>
> To sum it up, I have no idea why Flink doesn't use all the Taskslots
> available despite having more than 1 Task and shouldn't the backpressure
> stop Task 2 since it's output buffer is getting full thanks to the
> backpressure mechanism ? Or maybe I should reduce the number of buffers to
> make the backpressure mechanism kick in ?
>
>
> Thanks in advance and best regards,
>
> Ky Alexandre
>

Reply via email to