Re: [DISCUSS] AIP-100: Eliminate Scheduler Starvation

Jens Scheffler Thu, 14 May 2026 14:36:10 -0700

Hi,

sorry after some time re-visited the document and must say... is a longread in a large complexity. Needed to read parts multiple times. Maybeit is excessively hard because over time it fragmented the chapters andapproaches a bit. Maybe also due to this I am not really sold andsimilar like Vikram missing some "hard facts" and requirements. Yes Iknow we have limits and also in our environments we see sometimesstarvations... but I am not sure if the proposed solution will fix it.Sometimes I feel like starvation is also on DB level or other limits? Iam not sure.

When I take a look to our production then where scheduler today also"starves" and is locked for a long time are a lot of other tasks whichare unrelated to queuing, like finding orphaned tasks, resetting thembecause of missing heartbeats or so. Applying Dag changes becuase of Dagversion changes. In our production I mostly feel like if the schedulerwould focus on scheduling tasks then it would get things done. Isomethings thought making two instances "just" for scheduling Dag runs,two only for calculating if tasks are ready to get to scheduled stateand keep some instances just for the queue processing. Would be evengreat to see where most time is spend. Also thought (if I would havetime) adding more metrics to see more details where time is wasted.

Anyway getting back to proposal: I am not sure. I think the write-upneeds a bit of re-structuring.

Reason: Assuming we can not have a "quick fix" with the current logicand before nailing down a new target logic I think we need to re-assessthe requirements first. This is not really elaborated but some chaptersmake implict assumptions. E.g. we drop task level priorities. Can we?Such things are critical entry conditions to plan a target logic andoptimize it.

Similar with the HITL optimizing individial exceptions. This would be annew, cool additional requirement as game-changer which potentiallyinfluences the solution space.

When updating the document I'd also propose to change all terms from"scheduling" in the document to "queueing". Took me another time a lotof confusion because in the state model of Airflow the state "scheduled"is used to mark that all predecessors are in the right state that a taskcan be queued. For me this is "scheduling". This is also a complexcalculation but explicit is not scope of the paper. It is only talkingabout the step where tasks are put to "queued" state for Executor topick-up. So "making tasks running/executing".

Then I also have a bit of a problem with the elaboration about definingpriority on Dag level. Whereas I assume that most use cases are OK toprioritize Dags and not individual tasks (but need to check if this is arequirement or not!) I am not sure how this would make the logic ofscheduling easier. At the end entities of tasks need to be scheduled.Not Dags. If there is an algorithm that is a candidate then some detailsare missing to understand if this is an alternative. Otherwise theoption does not be really considered.

Also as I commented earlier I would like to understand which Dag andtask volumes are considered. We know about the todays standard limit of1024 in Dynamic Mapped tasks. With the change applied can this then besafely increased to 10k? 100k? How many "tasks per hour" do we assume toget scheduled? Will there be a metric of "tasks dropped out ofscheduling" because of limits that we can see today and compare tofuture logic? So which factor is assumed to improve? How much CPU isneeded for the N tasks per hour? Will logic scale linear that if Iincrease to N scheduler instances will it run N-times speed? Or only log(N)?

I am not sure about other environments and if the logic helps in all usecases but I assume from our site I think we have 4 major different Daglayout patterns. Will the logic still be OK and improve all of them ordo we assume a drawback for the benefit in scaling on the other side?


1. Simple Dag with 1-3 tasks, but thousands of runs scheduled per hour,
   ~1000 runs concurrently (no problem today actually, see no starvation)
2. Simple Dag but Mapped Task with today 1024 -> Target 100k tasks in a
   run. ~10 runs concurrently, maybe more queued (noproblem today up to
   <~2000 mapped tasks)
3. Complex Dag with 10-500 tasks in a complex graph of dependency. Not
   a large volume of mapped tasks. Many parallel runs (our Dags are not
   too complex, other feedback welcome)
4. Medium complex Dag, 10-50 tasks in a Mapped Task Group. Mapping
   desired to be 1000+ so in total 50-100k tasks in a single run. Maybe
   10 runs (this is a severe problem when mapping is >100 already!)

In case (2) and (4) we also saw failures in OOM as the schedulerattempted to load all tasks in memory. Not sure if this was a problem ofqueuing or maybe even before in getting to scheduled. Would therefore begood to clarify which cases are potential to improve.

Some final comment: Have you also considered a simple approach that inthe described "starvation" when the dropout rate is high (e.g. 90%) usethe dropped Dags and Tasks / Pools that lead to dropout as additionalselect filter and query another round of tasks via `max_tis_per_query`and repeat until the dropout is below a threshold? That might extend thetime for a scheduler loop but due to applied incremental filter improveeffectiveness of tasks that are scheduled. As well as maximizes the nettime scheduler spends on task queing compared to other things where netscheduler time is spend on if queuing is the limiting factor.

While my write-up should not be treated as a rejection I think theAIP-100 needs to be a bit re-sorted to be understandable and convincing.So far I am not clear if the proposed solution is fixing the root.

Not sure if there are other experts with other (general) opinions butnot having this paper starving in review I could tink of having a 1-2hcall to discuss in-person to sort things. I could offer to join to givesome guidance if not others are on this already.


Jens

On 14.05.26 18:28, Jarek Potiuk wrote:

Hi Vikram,

Natanel has reworked the approach, and it is receiving a much warmer
reception than before. This is Natanel's restart attempt, and there has
been no other discussion about it. While the "why" remains the
same—addressing starvation issues experienced by various users (including
Jens' team)—the previous proposals were rejected by Ash, Jens, and me due
to excessive complexity, such as building new algorithms via stored
procedures and such.

The current proposal is simpler and introduces an interesting way to
prioritize SLA callbacks - whichis an interesting feature to have. While it
requires further scrutiny, it appears to be a direction we could
potentially accept.

Best,
Jarek

On Thu, May 14, 2026 at 4:50 PM Vikram Koka via dev<[email protected]>
wrote:

I probably missed the updates here, but when this was brought up at the dev
call a while ago, I thought the response was a firm "No".
Did I miss the follow-on updates?

The discussion centered on the "why".
This is such a "power user" feature, which is fine, but it lacks
articulation of the projected benefit in quantifiable terms.

It has very high technical complexity in the core of Airflow, which could
cause massive ripple effects and therefore requires a cautious approach.

I left a comment in the AIP document as well, about needing to understand
the "why" better.

Vikram



On Thu, May 14, 2026 at 1:28 AM Christos Bisias<[email protected]>
wrote:

Hello,

I've got a question but I can't find a way to add a comment under the

AIP.

The 'Weighted Aging and SLA Urgency' solves the starvation issue but do

we

have an idea of what happens with the scheduler's performance?

Christos

On Thu, May 14, 2026 at 7:47 AM Elad Kalif<[email protected]> wrote:

Kind reminder to everyone who still wants to add comments
If no further comments i think we can move this to a vote.

On Tue, May 5, 2026 at 12:00 AM Jarek Potiuk<[email protected]> wrote:

Very interesting - also will take a look shortly - and great how it

seems

to tap into SLA urgency indeed..

On Mon, May 4, 2026 at 10:08 PM Przemysław Mirowski <

[email protected]

wrote:

+1 for Weighted Aging and SLA Urgency proposition.
________________________________
From: Elad Kalif<[email protected]>
Sent: 30 April 2026 12:08
To:[email protected] <[email protected]>
Subject: Re: [DISCUSS] AIP-100: Eliminate Scheduler Starvation

I love the idea of dynamic priorities and I think this is a good

direction.

On Thu, Apr 30, 2026 at 12:58 PM Ash Berlin-Taylor <[email protected]

wrote:

Thanks, this re-drafted version looks interesting. I’m trying to
internalise and understand the new proposal. I’l leave a few
comments/questions on the confluence page as I go.

(I have to say, the move away from epic stored procedure def is a

welcome

change!)

-ash

On 22 Apr 2026, at 14:45, Natanel<[email protected]>

wrote:

Hello community, I want to spark again a discussion that was

held

in

the

past and delayed (due to the focus on releasing airflow 3.2),

now

that

3.2

is released, I think it might be a good Idea to bring up the

discussion

again.

Wiki:

https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-100+Eliminate+Scheduler+Starvation+On+Concurrency+Limits

In the current situation, tasks may starve in airflow, and in

large

scale

deployments (hundreds of thousands of tasks (or more) per day)

we

tend

to

experience severe starvation, where a group of tasks may starve

other

tasks, not allowing them to run, as described in the wiki.

After the february devcall, where I have proposed the AIP, a

few

comments

have arised, and so I had begun to research again about

different

scheduling algorithms and I have added to the considerations as

part

of

the

AIP.

As of now, the state of the AIP is where there are a few ideas

proposed

(some of which are pretty similar to each other, while others

are

quite

different), as the main concern from the devcall was that the

approaches

given might not be the best way to solve the issue, as it is a

very

hard

problem to solve.

After that, I have made some edits to the AIP and to the

propositions,

in

order to help decide and clarify the advantages and

disadvantages

of

each

approach.
The current "best approach" can be found here here
<

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=406618462#AIP100EliminateSchedulerStarvationOnConcurrencyLimits-Currentbestproposition

,
where the new proposed algorithms are defined here
<

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=406618462#AIP100EliminateSchedulerStarvationOnConcurrencyLimits-Othernonagingalgorithmsother_algs

.

In order to continue with the effort, a community consensus

needs

to

be

reached about the preferred solution/solutions, once this is

done,

it

is

possible to go on and implement + stress test the proposed

solution/s.

I would appreciate a review from community members, moreover, I

would

also

appreciate any new propositions or improvements which can be

done.

Best regards
Natanel.

---------------------------------------------------------------------

To unsubscribe, e-mail:[email protected]
For additional commands, e-mail:[email protected]

Re: [DISCUSS] AIP-100: Eliminate Scheduler Starvation

Reply via email to