> Therefore I'd propose (1) a pragmatic fix that can be made NOW as a
bugfix and that is a global config switch similar like pools

As long as the users who rely on it have a way to bring it back, yes, it's
good for me.

> I do not understand why the discussion here is more preceise... when the
mentioned change about the max_active_tasks was not respecting this at
all and was semantically just a breaking change... yeah my bad that I
missed the discussion but this is really really a problem now for us :-(

This only highlights that those discussions should be longer and dig deeper
- even in the original lazy consensus Daniel mentioned "admittedly brief
discussion".
That's why we should make sure than any change in behaviour is discussed
and has a way to go back.

And if it is really such a big problem now and it causes real problems and
there is no alternative I absolutely see a possibility of reverting that
decision implemented in #42953 - would it solve your problem?  Should we do
it? Or maybe there is an alternative way for you to restore the limits?

J.

On Mon, Mar 9, 2026 at 9:45 PM Jens Scheffler <[email protected]> wrote:

> Hi Jarek, et al.
>
> I assume so that there are many cases. Some people might want to limit
> the "workers" and in these cases the deferred are not likely counting-in
> and in other cases the limits are rather defined to protect backends
> (like with pools). This is also like we (urgently) need this and why the
> bug came-up.
>
> I think adding more parameters, options and renaming them all to proper
> namign is a mid- to long excercise especially as it involves all users
> Dag migration. So this is a change that will maybe only have a final
> cleanup in Airflow 4. We are not green-field thus we anyway need
> backward compatibility. Aligning on the naming (which is always the
> hardest...) and making this change will take longer.
>
> Therefore I'd propose (1) a pragmatic fix that can be made NOW as a
> bugfix and that is a global config switch similar like pools. Aim to
> tell on a global level to count deferred in-or out. And following-up as
> (2) a rework of the limitation parameters which - quite frankly - are a
> bit fragmented in various areas.
>
> Would this be OK for most?
>
> Jens
>
> P.S.: I need to highlight here that after recent migration to Airflow 3
> in our production we had the last days serious problems and very bad
> feedback from our users. Took us more than a day to understand and drill
> into the root cause - so multiple person days wasted until we found the
> root in the discussion
> https://lists.apache.org/thread/nn4y1z0yrydkmw9np4f0z5lm9gh8tmfl with
> the lazy consensus
> https://lists.apache.org/thread/9o84d3yn934m32gtlpokpwtbbmtxj47l and the
> PR https://github.com/apache/airflow/pull/42953 causing this trouble. Why?
>
> Because in Airflow 2 we used max_active_tasks which defaulted to 16 as a
> safety net, everybody could write their Dag. If somebody wanted to scale
> larger then a PR with increasing max_active_tasks > 16 caused a review
> and we could see which Dag effectively took however much resources. With
> the (badly documented!) semantic change in Airflow 3 now a lot of
> workload runs un-restricted because relying on mapped tasks and the
> alternative max_active_tis_per_dag is not defaulted to 16 like the
> previous as well as is only counted per task... so if you have a Dag
> with multiple mapped tasks and Pools are needed for other limits this is
> a problem now.
>
> I do not understand why the discussion here is more preceise... when the
> mentioned change about the max_active_tasks was not respecting this at
> all and was semantically just a breaking change... yeah my bad that I
> missed the discussion but this is really really a problem now for us :-(
>
> On 09.03.26 10:18, Jarek Potiuk wrote:
> > +1 on what TP and Karthikeyan wrote. We need a solid proposal for naming
> > and explicitly defining those terms, along with a way for users to keep
> the
> > old counting method (settable per Dag). And I think it would be ok to
> > change default behaviour as long as we are very clear in documenting it,
> > explaining that this is really a "bug fix" (in the sense that this
> > behaviour was really not intentional and by changing it we express our
> > intentions explicitly) and allow the users to go back easily in Dags that
> > rely on it - so that they can maybe rework them in the future and remove
> > it.
> >
> > On Mon, Mar 9, 2026 at 8:35 AM Karthikeyan <[email protected]> wrote:
> >
> >> +1 on having a field to restore backwards compatible on dag level if the
> >> dag parameter is being changed. Most of our workloads involve submitting
> >> jobs to Spark and other upstream systems and each user has a
> corresponding
> >> pool. With deferred being not counted as active users had issues where
> many
> >> submissions were done that the upstream cannot handle. So for those
> users
> >> the pool was updated. There are other workloads like http based defers
> >> where users just poll and don't need to worry about the upstream
> capacity.
> >>
> >> I guess deferred was initially documented such that pool slot is
> released
> >> and there can be more concurrent workloads. The definition of being
> counted
> >> as active depends on the workload and use case. It will be helpful to
> have
> >> the option to have this behaviour as optional and opt-in to avoid
> >> confusion.
> >>
> >> Thanks
> >>
> >> Regards,
> >> Karthikeyan S
> >>
> >>
> >>
> >> On Mon, Feb 23, 2026, 9:43 PM Vikram Koka via dev <
> [email protected]>
> >> wrote:
> >>
> >>> I definitely agree with the intent, but concerned about the actual
> >>> implications of making this change from a user experience perspective.
> >>>
> >>> With respect to pools, I would like an updated perspective on how
> useful
> >>> and used this is today. For example, I suspect that the async Python
> >>> operator change coming in the new AIP as part of 3.2 does not respect
> the
> >>> pools configuration either.
> >>>
> >>> The max active task configurations are very useful while using the
> Celery
> >>> executor, which is the majority today. I got a bunch of questions
> around
> >>> this as part of the backfill enhancements in 3.0.
> >>>
> >>> I hesitate making changes to these configuration options without clear
> >>> understanding and articulation of the tradeoffs.
> >>>
> >>> Just my two cents,
> >>> Vikram
> >>>
> >>> On Mon, Feb 23, 2026 at 2:34 AM Wei Lee <[email protected]> wrote:
> >>>
> >>>> I like what Jarek suggested, but we should avoid using the term
> >>> "Running".
> >>>>  From Airflow's perspective, a Deferred task is not considered a
> Running
> >>>> task, even though it may be viewed differently in the user's context.
> >>>>
> >>>> Additionally, we are currently using the term "Executing" here
> >>>>
> >>
> https://github.com/apache/airflow/blob/e0cd6e246c288d33f359ec2268b3d342832e9648/airflow-core/src/airflow/utils/state.py#L67
> >>>> Maybe we can count Deferred and Running tasks as "Executing"? The
> thing
> >>>> that kinda bugs me is that "Defered" is also an IntermediateTIState
> >> here.
> >>>> On 2026/02/22 20:22:45 Natanel wrote:
> >>>>> Hello Jens, I agree with everything you said, for some reason the
> >>>>> "Deferred" state is not counted towards an active task, where
> >>> intuitively
> >>>>> it should be part of the group.
> >>>>>
> >>>>> As I see it at least, all the configurations talk about *active*
> >> tasks
> >>>>> (such as max_active_tasks, max_active_tis_per_dag,
> >>>>> max_active_tis_per_dagrun), which I think is a quite confusing term.
> >>>>> And to solve this, a clear explanation of what is an "active" task
> >>> should
> >>>>> be defined.
> >>>>>
> >>>>> It is possible to define that an "active" task is any task which is
> >>>> either
> >>>>> running, queued OR deferred, but this will enforce a new
> >> configuration
> >>>> for
> >>>>> backwards compatibility, such as "count_deffered_as_active" (yet this
> >>> is
> >>>>> more enforcing and global approach, which we might not want), while
> >> not
> >>>>> introducing too much additional complexity, as adding more parameters
> >>> by
> >>>>> which we schedule tasks will only make scheduling decisions harder,
> >> as
> >>>> more
> >>>>> parameters need to be checked, which will most likely slow down each
> >>>>> decision, and might slow down the scheduler.
> >>>>>
> >>>>> I liked Jarek's approach, however, I think that maybe instead of
> >>>>> introducing a few new params, we instead rename the current
> >> parameters,
> >>>>> while keeping behavior as is, slowly deprecating the "active"
> >>>>> configurations, as Jarek said, and for some time keep both the
> >> "active"
> >>>> and
> >>>>> the "running" param, while having the "active" be an alias for
> >>> "running"
> >>>>> until the "active" is deprecated.
> >>>>>
> >>>>> If there is a need for a param for deferred tasks, it is possible to
> >>> add
> >>>>> only for deferrable tasks, in order to not impact current scheduling
> >>>>> decisions done by the scheduler.
> >>>>>
> >>>>> I see both approaches as viable, yet I think that adding an
> >> additional
> >>>>> param might introduce more complexity, and maybe should be split out
> >> of
> >>>> the
> >>>>> regular task flow, as a deferrable task is not the same as a running
> >>>> task,
> >>>>> I tend to lean towards the first approach, as it seems to be the
> >>>> simplest,
> >>>>> however, the second approach might be more beneficial long-term.
> >>>>>
> >>>>> Best Regards,
> >>>>> Natanel
> >>>>>
> >>>>> On Sun, 22 Feb 2026 at 18:43, Jarek Potiuk <[email protected]> wrote:
> >>>>>
> >>>>>> +1. But I think that there are cases where people wanted to
> >>>>>> **actually** use `max_*` to limit how many workers the DAG or DAG
> >> run
> >>>>>> will take. So possibly we should give them such an option—for
> >>> example,
> >>>>>> max_running_tis_per_dag, etc.
> >>>>>>
> >>>>>> There is also the question of backward compatibility. I can see the
> >>>>>> possibility of side effects - if that changes "suddenly" after an
> >>>>>> upgrade. For example it might mean that some Dags will suddenly
> >> start
> >>>>>> using far fewer workers than before and become starved.
> >>>>>>
> >>>>>> So - if we want to change it, I think we should deprecate "_active"
> >>>>>> and possibly add two new sets of parameters with different
> >> names—but
> >>>>>> naming in this case is hard (more than usual).
> >>>>>>
> >>>>>> J.
> >>>>>>
> >>>>>> On Sun, Feb 22, 2026 at 5:25 PM Pavankumar Gopidesu
> >>>>>> <[email protected]> wrote:
> >>>>>>> Hi Jens,
> >>>>>>>
> >>>>>>> Thanks for starting this discussion. I agree that we should
> >> update
> >>>> how
> >>>>>>> these tasks are counted.
> >>>>>>>
> >>>>>>> I previously started a PR[1] to include deferred tasks in
> >>>>>> max_active_tasks,
> >>>>>>> but I was sidetracked by other priorities. As you noted, this
> >>> change
> >>>>>> needs
> >>>>>>> to encompass not only max_active_tasks but also the other
> >>> parameters
> >>>> you
> >>>>>>> described.
> >>>>>>>
> >>>>>>> [1]: https://github.com/apache/airflow/pull/41560
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>> Pavan
> >>>>>>>
> >>>>>>> On Sun, Feb 22, 2026 at 12:43 PM constance.astronomer.io via
> >> dev <
> >>>>>>> [email protected]> wrote:
> >>>>>>>
> >>>>>>>> Agreed. In my opinion, the only time we should not be counting
> >>>> deferred
> >>>>>>>> tasks are for configurations that control worker slots, like
> >>>> number of
> >>>>>>>> tasks that run concurrently on a celery worker, since tasks in
> >> a
> >>>>>> deferred
> >>>>>>>> state are not running on a worker (although you can argue that
> >> a
> >>>>>> triggerer
> >>>>>>>> is a special kind of worker but I digress).
> >>>>>>>>
> >>>>>>>> For the examples you’ve listed, deferred tasks should be part
> >> of
> >>>> the
> >>>>>>>> equation since the task IS running, just not in a traditional
> >>>> worker.
> >>>>>>>> Thanks for bringing this up! This has been bothering me for
> >>> awhile.
> >>>>>>>> Constance
> >>>>>>>>
> >>>>>>>>> On Feb 22, 2026, at 4:18 AM, Jens Scheffler <
> >>> [email protected]
> >>>>>> wrote:
> >>>>>>>>> Hi There!
> >>>>>>>>>
> >>>>>>>>> TLDR: In fix PR https://github.com/apache/airflow/pull/61769
> >>> we
> >>>>>> came to
> >>>>>>>> the point that it seems today in Airflow Core the "Deferred"
> >>> state
> >>>>>> seems to
> >>>>>>>> be counted inconsistently. I would propose to consistently
> >> count
> >>>>>> "Deferred"
> >>>>>>>> into the counts of "Running".
> >>>>>>>>> Details:
> >>>>>>>>>
> >>>>>>>>> * In Pools for a longer time (since PR
> >>>>>>>>>    https://github.com/apache/airflow/pull/32709) it is
> >> possible
> >>>> to
> >>>>>>>>>    decide whether tasks in deferred state are counted into
> >> pool
> >>>>>>>>>    allocation or not.
> >>>>>>>>> * Before that Deferred were not counted into, which caused
> >>> tasks
> >>>>>> being
> >>>>>>>>>    in deferred potentially overwhelm backends which defesated
> >>> the
> >>>>>>>>>    purpose of pools
> >>>>>>>>> * Recently it was also seen that other limits we usually have
> >>> on
> >>>> Dags
> >>>>>>>>>    defined as following do not consistently include deferred
> >>> into
> >>>>>> limits.
> >>>>>>>>>      o max_active_tasks - `The number of task instances
> >> allowed
> >>>> to run
> >>>>>>>>>        concurrently`
> >>>>>>>>>      o max_active_tis_per_dag - `When set, a task will be able
> >>> to
> >>>>>> limit
> >>>>>>>>>        the concurrent runs across logical_dates.`
> >>>>>>>>>      o max_active_tis_per_dagrun - `When set, a task will be
> >>> able
> >>>> to
> >>>>>>>>>        limit the concurrent task instances per Dag run.`
> >>>>>>>>> * This means at the moment defining a task as async/deferred
> >>>> escapes
> >>>>>>>>>    the limits
> >>>>>>>>>
> >>>>>>>>> Code references:
> >>>>>>>>>
> >>>>>>>>> * Counting tasks in Scheduler on main:
> >>>>>>>>>
> >>
> https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/jobs/scheduler_job_runner.py#L190
> >>>>>>>>> * EXECUTION_STATES used for counting:
> >>>>>>>>>
> >>
> https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/ti_deps/dependencies_states.py#L21
> >>>>>>>>>      o Here "Deferred" is missing!
> >>>>>>>>>
> >>>>>>>>> Alternatives that I see:
> >>>>>>>>>
> >>>>>>>>> * Fix it in Scheduler consistently that limits are applied
> >>>> counting
> >>>>>>>>>    Deferred always in
> >>>>>>>>> * There might be a historic reason that Deferred is not
> >>> counting
> >>>> in -
> >>>>>>>>>    then a proper documentation would be needed - but I'd
> >> assume
> >>>> this
> >>>>>>>>>    un-likely
> >>>>>>>>> * There are different opinions - then the behavior might need
> >>> to
> >>>> be
> >>>>>>>>>    configurable. (But personally I can not see a reason for
> >>> having
> >>>>>>>>>    deferred escaping the limits defined)
> >>>>>>>>>
> >>>>>>>>> Jens
> >>>>>>>>
> >>>> ---------------------------------------------------------------------
> >>>>>>>> To unsubscribe, e-mail: [email protected]
> >>>>>>>> For additional commands, e-mail: [email protected]
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail: [email protected]
> >>>>>> For additional commands, e-mail: [email protected]
> >>>>>>
> >>>>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: [email protected]
> >>>> For additional commands, e-mail: [email protected]
> >>>>
> >>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to