Hi, dev.
After reviewing the entire email discussion thread with Rui, I noticed that my
previous ambiguous understanding led to a few incorrect conclusions.
So I need to change the corresponding conclusions. And Thanks for the help from
Rui.
>For David:
>The problem you're trying to s
Hi, Rui.
Thank you for the update.
+1 for the updated edition of the FLIP page.
And thanks Zhu Zhu, Yangze Guo for the discussion.
Best Regards.
Yuepeng Pan
On 2023/10/17 03:45:08 Rui Fan wrote:
> Hi all,
>
> Offline discussed with Zhu Zhu, Yangze Guo, Yuepeng Pan.
> We reached consensus on sl
Thanks for the update, Rui. +1 for the latest version of the FLIP.
Best,
Yangze Guo
On Tue, Oct 17, 2023 at 11:45 AM Rui Fan <1996fan...@gmail.com> wrote:
>
> Hi all,
>
> Offline discussed with Zhu Zhu, Yangze Guo, Yuepeng Pan.
> We reached consensus on slot.request.max-interval and
> taskmanage
Hi all,
Offline discussed with Zhu Zhu, Yangze Guo, Yuepeng Pan.
We reached consensus on slot.request.max-interval and
taskmanager.load-balance.mode. And I have updated the FLIP.
For a detailed introduction to taskmanager.load-balance.mode,
please refer to FLIP’s 3.1 Public Interfaces[1].
And th
Hi, Shammon.
Thanks for your feedback.
>1. This mechanism will be only supported in `SlotPool` or both `SlotPool` and
>`DeclarativeSlotPool`?
As described on the FLIP page, the current design plans to introduce the
waiting mechanism only in the `SlotPool`, because the existing
`WaitingForReso
Hi Yuepeng,
Thanks for your feedback. I agree with u, both approaches can achieve the
goal.
As long as we can easily extend the balancing strategy to consider more
than one factors without changing the interface, the solution is OK for me.
Regards,
Xiangyu
Yuepeng Pan 于2023年10月11日周三 17:38写道:
>
Hi, xiangyu.
Thanks for your quick reply.
>interface currently only includes a description of the number of tasks. So,
>IIUC, If there is a need to further expand
>current interface and its implementations, right?
Yes, that's indeed the case.
>I checked the interface design of LoadingWeight and
Hi Yuepeng,
Thx for ur reply.
> Nice feedback. In fact, as mentioned in the Google Doc, the LoadingWeight
interface currently only includes a description of the number of tasks. So,
IIUC, If there is a need to further expand
> descriptions of other resource loads, we just extend it based on the
c
Hi, xiangyu,
Thanks for your attention as well.
>1, About the waiting mechanism: Will the waiting mechanism happen only in
>the second level 'assigning slots to TM'? IIUC, the first level 'assigning
>Tasks to Slots' needs only the asynchronous slot result from slotpool.
As described in the latest
Hi, David,
Thank you very much for your attention.
>The problem you're trying to solve only exists in complex graphs with
>different per-vertex parallelism. If the parallelism is set globally
>(assuming the pipeline has roughly even data skew), the algorithm could
>make things slightly worse by e
Hi Zhu,
Thanks for your clarification!
I misunderstood before, it's clear now.
Best,
Rui
On Tue, Oct 10, 2023 at 6:17 PM Zhu Zhu wrote:
> Hi Rui,
>
> Not sure if I understand your question correctly. The two modes are not
> the same:
> {taskmanager.load-balance.mode: Slots} = {cluster.evenly-
Hi Rui,
Not sure if I understand your question correctly. The two modes are not the
same:
{taskmanager.load-balance.mode: Slots} = {cluster.evenly-spread-out-slots:
true, slot.sharing-strategy: LOCAL_INPUT_PREFERRED}
{taskmanager.load-balance.mode: Tasks} = {cluster.evenly-spread-out-slots:
true,
Hi Zhu,
Thanks for your feedback!
>> 2. When it's set to Tasks, how to assign slots to TM?
> It's option2 at the moment. However, I think it's just implementation
> details and can be changed/refined later.
>
> As you mentioned in another comment, 'taskmanager.load-balance.mode' is
> a user orien
Thanks for the response, Rui and Yuepeng.
>> Rui
> 1. The default value is None, right?
Exactly.
> 2. When it's set to Tasks, how to assign slots to TM?
It's option2 at the moment. However, I think it's just implementation
details and can be changed/refined later.
As you mentioned in another com
Thanks for the updates, Rui.
It does seem challenging to ensure evenness in slot deployment unless
we introduce batch slot requests in SlotPool. However, one possibility
is to add a delay of around 50ms during the SlotPool's resource
requirement declaration to the ResourceManager, similar to the
c
Hi Yangze,
Thanks for your quick response!
Sorry, I re-read the 2.2.2 part[1] about the Waiting Mechanism, I found
it isn't clear. The root cause of introducing the waiting mechanism is
that the slot requests are sent from JobMaster to SlotPool is
one by one instead of one whole batch. I have rew
Thanks for the clarification, Rui.
I believe the root cause of this issue is that in the current
DefaultResourceAllocationStrategy, slot allocation begins before the
decision to PendingTaskManagers requesting is made. That can be fixed
within the strategy without introducing another waiting mechan
Hi Yangze,
> 2. From my understanding, if user enable the
> cluster.evenly-spread-out-slots,
> LeastUtilizationResourceMatchingStrategy will be used to determine the
> slot distribution and the slot allocation in the three TM will be
> (taskmanager.numberOfTaskSlots=3):
> TM1: 3 slot
> TM2: 2 slot
Hi Shammon,
IIUC, you want more flexibility in controlling the two-phase strategy,
right?
> I want this because we would like to add a new slot to TM strategy such
as SLOTS_NUM in the future for OLAP to improve the performance for olap
jobs, which will use TASKS strategy for task to slot. cc Guoy
Thanks Rui, I check the codes and you're right.
As you described above, the entire process is actually two independent
steps from slot to TM and task to slot. Currenlty we use option
`cluster.evenly-spread-out-slots` for both of them. Can we provide
different options for the two steps, such as ANY
Thanks Yuepeng and Rui for driving this Discussion.
Internally when we try to use Flink 1.17.1 in production, we are also
suffering from the unbalanced task distribution problem for jobs with high
qps and complex dag. So +1 for the overall proposal.
Some questions about the details:
1, About the
Hi, Zhu Zhu,
Thanks for your feedback!
> I think we can introduce a new config option
> `taskmanager.load-balance.mode`,
> which accepts "None"/"Slots"/"Tasks". `cluster.evenly-spread-out-slots`
> can be superseded by the "Slots" mode and get deprecated. In the future
> it can support more mode,
Hello Yuepeng,
The FLIP reads sane; nice work! To re-phrase my understanding:
The problem you're trying to solve only exists in complex graphs with
different per-vertex parallelism. If the parallelism is set globally
(assuming the pipeline has roughly even data skew), the algorithm could
make thi
Hi, Rui,
1. With the current mechanism, when physical slots are offered from
TM, the JobMaster will start deploying tasks and synchronizing their
states. With the addition of the waiting mechanism, IIUC, the
JobMaster will deploy and synchronize the states of all tasks only
after all resources are
Hi Shammon,
Thanks for your feedback as well!
> IIUC, the overall balance is divided into two parts: slot to TM and task
to slot.
> 1. Slot to TM is guaranteed by SlotManager in ResourceManager
> 2. Task to slot is guaranteed by the slot pool in JM
>
> These two are completely independent, what a
Hi Yangze,
Thanks for your feedback!
> 1. Is it possible for the SlotPool to get the slot allocation results
> from the SlotManager in advance instead of waiting for the actual
> physical slots to be registered, and perform pre-allocation? The
> benefit of doing this is to make the task deploymen
Thanks Yuepeng for initiating this discussion.
+1 in general too, in fact we have implemented a similar mechanism
internally to ensure a balanced allocation of tasks to slots, it works well.
Some comments about the mechanism
1. This mechanism will be only supported in `SlotPool` or both `SlotPoo
Hi Zhu Zhu,
Thanks for your feedback here!
You are right, user needs to set 2 options:
- cluster.evenly-spread-out-slots=true
- slot.sharing-strategy=TASK_BALANCED_PREFERRED
Update it to one option is useful at user side, so
`taskmanager.load-balance.mode` sounds good to me.
I want to check some
Thanks for driving this FLIP, Yuepeng Pan. +1 for the overall proposal
to support balanced scheduling.
Some questions on the Waiting mechanism and Allocation strategy for slot to TM:
1. Is it possible for the SlotPool to get the slot allocation results
from the SlotManager in advance instead of w
Thanks Yuepeng and Rui for creating this FLIP.
+1 in general
The idea is straight forward: best-effort gather all the slot requests
and offered slots to form an overview before assigning slots, trying to
balance the loads of task managers when assigning slots.
I have one comment regarding the con
30 matches
Mail list logo