Re: [DISCUSS] Let TriggerDagRunOperator own its execution logic

Jarek Potiuk Mon, 29 Jun 2026 00:26:37 -0700

I think my main concern is that TriggerDagRunOperator is part of the
standard provider, which currently means:


* Airflow 2.11.0 support
* Airflow 3.0 -> 3.3 support

The ti.trigger_dag_run() and the mixin are 3.3+, and the implementation
also needs to account for the deferrable path.

Even 3.0 -> 3.2 is challenging, and I guess 2.11 will be nightmar'ish in
this scenario - we might end up with way more duplication than what current
https://github.com/apache/airflow/pull/68936 introduces.

And of course unifying and "owning" execution is a better direction.
Switching to persisting run_id makes so much more sense so If we can find a
nice way without bumping standard to 3.3+ that would be great.

BTW. Should we resume discussion about bumping min_version for Providers?
If we don't return to the regular schedule - we will start being again
(quoting Daniel Standish) - long term backwards compatibility engineers.
And if we have no clear vision - like we had in Airflow 2 regarding the
"min_version" approach—this will get worse month-by-month.

Currently we have no idea how long we will need to support a back-compat
solution like this. This makes it difficult to make rational choices about
the level of duplication, deprecation, or backward compatibility worth
maintaining because we don't know the maintenance duration. Therefore,
working out reasonable trade-offs here is nearly impossible.

If we started to apply the same rule we had in Airflow 2 (12 months since
.0 version release) we would have:

* Today we would already have >= 3.1
* 25th of September we would have >= 3.2
* 7th of April 2027 we would have >= 3.3

This would mean that we would have 9 months of support for 3.2 until we
could get rid of any back-compat.

J.


On Mon, Jun 29, 2026 at 8:32 AM Stefan Wang <[email protected]> wrote:

> Thanks Amogh and Jarek:
>
> +1, this makes sense and is a better approach to take. Letting the
> operator own its execution and just subclass ResumableJobMixin is cleaner
> than what https://github.com/apache/airflow/pull/68936 does today.
>
> Right now the contract is duplicated in the task runner, and the proposed
> gets rid of the special case instead of trying to share it.
>
> The accessor is small. trigger_dag_run() would mirror the
> ti.get_dagrun_state() we already have, hitting the same execution API
> endpoint the runner uses today with the same token, so no new authz. It
> basically finishes the AIP-72 migration that added DagRunTriggerException
> as a stopgap (https://github.com/apache/airflow/pull/47882).
>
> Happy to do the POC, and rework #68936 onto the accessor then link it
> here.
> Would the main thing be checking is back-compat? - execute() currently
> raises on every Airflow 3 run, not just the ones that wait, so in the the
> POC we want to prove keeping that behavior identical?
>
> Best,
> Stefan
>
> > On Jun 28, 2026, at 10:38 PM, Jarek Potiuk <[email protected]> wrote:
> >
> > Sounds reasonable - maybe a quick POC would be good to show how it could
> > look like and allowed to assess if there are some back-compat concerns.
> >
> > On Mon, Jun 29, 2026 at 7:27 AM Amogh Desai <[email protected]>
> wrote:
> >
> >> Now that Airflow 3.3 will introduce ResumableJobMixin to make
> synchronous
> >> submit and poll operators crash-safe, I wanted to start a discussion.
> >>
> >> I came across https://github.com/apache/airflow/pull/68936, which
> brings
> >> crash recovery/durable exeucution to TriggerDagRunOperator, but it's a
> case
> >> which cannot use the mixin. On Airflow 3 the operator's execute() raises
> >> *DagRunTriggerException*; the actual trigger and the wait loop run in
> the
> >> task runner
> >> (_handle_trigger_dag_run). So the PR reimplements the mixin's three
> state
> >> contract (succeeded / reconnect / resubmit) and persist-before-poll by
> hand
> >> in the
> >> runner. This means that we will now have two copies of the same contract
> >> that can drift.
> >>
> >> It cannot use the mixin's contract because it only offloads its
> execution
> >> to task runner, and doesn't own it. For more context, the poll primitive
> >> already exists as a
> >> user-callable accessor (ti.get_dagrun_state()). The only missing
> primitive
> >> is triggering a dag run.
> >>
> >> I propose that we revisit this portion. I propose we introduce an
> execution
> >> API accessor in task sdk for triggering dagruns, which will be the
> >> counterpart to the existing
> >> ti.get_dagrun_state(). It routes through the same execution endpoint the
> >> runner already uses, so no new authz surface is changed.
> >>
> >> This proposal does not expand what task code can do, it just gives a
> first
> >> class way to do something already possible. A task JWT can already
> trigger
> >> dag runs through the
> >> Execution API today: that is exactly what DagRunTriggerException does
> under
> >> the hood. The proposed *ti.trigger_dag_run()* accessor routes through
> the
> >> same endpoint with
> >> the same scoped token, so the boundary is identical, just reached
> through a
> >> clean, supported API instead of an exception side channel.
> >>
> >> Happy to hear thoughts from folks.
> >>
> >>
> >> Thanks & Regards,
> >> Amogh Desai
> >>
>
>

Re: [DISCUSS] Let TriggerDagRunOperator own its execution logic

Reply via email to