Yes, it might be the case. Hard to tell for sure without looking at the
job, metrics etc. Just be mindful of what I described, and if you want to
fine tune a job and set different parallelism values for different
operators, pay attention to where those operators are being distributed.
Usually in practice there is little reason to choose (1-2-1-1-1) over
(2-2-2-2-2). If you spread a load of some operators more than you need?
Usually not an issue. On the other hand with (2-2-2-2-2) you will spread
the load more evenly across task managers, which makes it easier to
tune/analyse/optimise.

Best,
Piotrek

pt., 5 mar 2021 o 09:10 Jan Nitschke <nitschke...@web.de> napisał(a):

> Hey Piotr,
>
> thanks for your answer, that makes perfect sense. However, when looking at
> the number of messages being processed, we can see that both subtasks on
> task 2 will produce the same amount of messages in the (1-2-1-1-1)
> scenario, even with the first task hitting backpressure. We assume that
> this has to do with the distribution of messages between task. As messages
> are being distributed equally among subtasks in our case, would this be an
> explanation for that behavior?
>
> Best,
> Jan
>
>
> *Gesendet:* Mittwoch, 03. März 2021 um 19:53 Uhr
> *Von:* "Piotr Nowojski" <pnowoj...@apache.org>
> *An:* "Jan Nitschke" <nitschke...@web.de>
> *Cc:* "user" <user@flink.apache.org>
> *Betreff:* Re: Independence of task parallelism
> Hi Jan,
>
> As far as I remember, Flink doesn't handle very well cases like
> (1-2-1-1-1) and two Task Managers. There are no guarantees how the
> operators/subtasks are going to be scheduled, but most likely it will be as
> you mentioned/observed. First task manager will be handling all of the
> operators, while the second task manager will only be running a single
> instance of the second operator (for load balancing reasons it would be
> better to spread the tasks across those two Task Managers more evenly).
>
> No, Flink doesn't hold any resources (appart of network buffers) per task.
> All of the available memory and CPU resources are shared across all of the
> running tasks. So in the (1-2-1-1-1) case, if the first task manager will
> be overloaded (for example if it has very few CPU cores), the second task
> will perform much better on the second task manager (which will be empty),
> causing a throughput skew. From this perspective, (2-2-2-2-2) would most
> likely be performing better, as the load would be more evenly spread.
>
> Piotrek
>
> niedz., 28 lut 2021 o 13:10 Jan Nitschke <nitschke...@web.de> napisał(a):
>
>> Hello,
>>
>> We are working on a project where we want to gather information about the
>> job performance across different task level parallelism settings.
>> Essentially, we want to see how the throughput of a single task varies
>> across different parallelism settings, e.g. for a job of 5 tasks: 1-1-1-1-1
>> vs. 1-2-1-1-1 vs. 2-2-2-2-2.
>>
>> *We are running flink on Kubernetes, a job with 5 tasks, slot sharing is
>> enabled, operator chasing is disabled and each task manager has one slot.*
>>
>> So, the number of task managers is always the number of the highest
>> parallelism and wen can fit the entire job into one task manager slot.
>>
>> We are then running the job against multiple parallelism configs (such as
>> those above), collect the relevant metrics and try to get some useful
>> information out of them.
>>
>> We are now wondering how independent our results are from one another.
>> More specifically, if we now look at the parallelism of the second task, is
>> its performance independent of the parallelism of the other tasks? So, will
>> a the second task perform the same in (1-2-1-1-1) as in (2-2-2-2-2)?
>>
>> Our take on it is the following: With our setup, (1-2-1-1-1) should
>> result in one task manager holding the entire job and a second task manager
>> that only runs the second task. (2-2-2-2-2) will run two task managers with
>> the entire job. So, theoretically, the second task should have much more
>> resources available in the first setup as it has the entire resources of
>> that task manager to its disposal. Does that assumption hold or will flink
>> assign a certain amount of resources to a task in a task manager no matter
>> how many other tasks are running on that same task manager slot?
>>
>> We would highly appreciate any help.
>>
>> Best,
>> Jan
>>
>

Reply via email to