Hello,

Thank you for you answers,  I now understand Flink's behavior.


Thank you and best regards,

Ky Alexandre

________________________________
De : Aleksandr Pilipenko <z3d...@gmail.com>
Envoyé : vendredi 12 juillet 2024 19:42:06
À : Alexandre KY
Cc : user
Objet : Re: Taskslots usage

Hello Alexandre,

Flink does not use TaskSlot per each task by default, but rather task slot will 
hold a slice of the entire pipeline (up to 1 subtasks of each operator, 
depending on the operator parallelism) [1].
So if your job parallelism is 1 - only a single task slot will be occupied.

If you want to modify this behavior and distribute operators between slots - 
you can take a look at slot sharing groups [2].

1 - 
https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/concepts/flink-architecture/#task-slots-and-resources
2 - 
https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/datastream/operators/overview/#set-slot-sharing-group

Kind regards,
Aleksandr

On Fri, 12 Jul 2024 at 17:34, Alexandre KY 
<alexandre...@magellium.fr<mailto:alexandre...@magellium.fr>> wrote:

Hello,


I am trying to run a pipeline made of 3 tasks (have a look at flink_job.png):

  *   Task 1: Source, FlatMap, Map, keyBy
  *   Task 2: Window, Map, keyBy
  *   Task 3: FlatMap, Map, Sink

>From what I have read, in streaming mode, all the tasks run simultaneously. 
>Therefore, each task take one TaskSlot like 
>here<https://medium.com/@LIME0040/flink-trouble-shooting-techniques-i-resource-management-73c61fcc8022>.
> However, as you can see in the picture flink_tm (I run the job on a cluster 
>made of 1 jobmanager and 1 taskmanager), the taskmanager has 3 slots, but only 
>1 of them is being used even though the 3 tasks are running. The first task is 
>still creating more data (supposed to produce 25 outputs) to send to the 2nd 
>one and even when the 3rd task receives data, the number of taskslots used 
>remain 1.


I don't understand why Flink doesn't use all the taskslots which leads it to 
behave similarly to batch mode: it tends to produce all the outputs of Task 1, 
then produces only the outputs of Task 2 and crashes because my computer is out 
of memory since it keeps accumulates the outputs of Task 2 in memory before 
sending them to Task 3 despite setting 
`env.set_runtime_mode(RuntimeExecutionMode.STREAMING)`

I said "tends" because while Task 2 is processing the 25 products, Task 3 
received 2 of them and produced 2 outputs, but after that it stopped (the 
number of records received remained 2) and Flink only runs Task 2 (I see it in 
the logs) until the memory explodes.


To sum it up, I have no idea why Flink doesn't use all the Taskslots available 
despite having more than 1 Task and shouldn't the backpressure stop Task 2 
since it's output buffer is getting full thanks to the backpressure mechanism ? 
Or maybe I should reduce the number of buffers to make the backpressure 
mechanism kick in ?


Thanks in advance and best regards,

Ky Alexandre

Reply via email to