Ya that makes sense. TriggerDagRunOperator lives in the standard provider, which
still supports 2.11 and 3.0-3.3, but the accessor and mixin are 3.3+ only. So 
the
operator can only own its execution on 3.3+; everything older still goes through
the path we have today. Shipping this into standard now wouldn't reduce the
duplication, it'd add to it: 3.3+ on the accessor, 3.0-3.2 still on the 
exception
path, 2.11 still on the Airflow 2 path. The POC
(https://github.com/apache/airflow/pull/69135) bears that out — the 3.3+
end-state is clean and the live crash/reconnect holds up, but it also put the
back-compat cost up for discussion.

The two things that hurts long-term maintainability are the contract drifting 
in two places
(Amogh's point) and a version-gated fork we can't delete. So my read on
the direction:

- The runner keeps driving the wait (as it does now), but through
ResumableJobMixin's extracted core instead of a hand-rolled copy (roughly what
https://github.com/apache/airflow/pull/68952 and
https://github.com/apache/airflow/pull/68955 do). So that we keep one contract 
instead of
two and no version fork

- The accessor (operator owns execution) is the right end-state. The catch is
that it only stops being extra duplication once the standard provider can drop
2.11 / 3.0-3.2 and require 3.3+; below that, the old fallbacks have to stay. The
POC shows what that end-state looks like, and the move from the shared core to 
it
later is small.

The accessor end-state needs a 3.3+ min. Whether and when we'd require that is
the min_version-for-providers question being raised here, and +1 to reopening 
it.
With the 12-month rule the floor would be around April 2027, so the shared-core
interim would only need to cover ~9 months. Maybe that’s short enough that 
going straight to
the accessor instead would also be reasonable?

Happy to try out whichever way gets consensus.

Stefan


> On Jun 29, 2026, at 12:26 AM, Jarek Potiuk <[email protected]> wrote:
> 
> I think my main concern is that TriggerDagRunOperator is part of the
> standard provider, which currently means:
> 
> * Airflow 2.11.0 support
> * Airflow 3.0 -> 3.3 support
> 
> The ti.trigger_dag_run() and the mixin are 3.3+, and the implementation
> also needs to account for the deferrable path.
> 
> Even 3.0 -> 3.2 is challenging, and I guess 2.11 will be nightmar'ish in
> this scenario - we might end up with way more duplication than what current
> https://github.com/apache/airflow/pull/68936 introduces.
> 
> And of course unifying and "owning" execution is a better direction.
> Switching to persisting run_id makes so much more sense so If we can find a
> nice way without bumping standard to 3.3+ that would be great.
> 
> BTW. Should we resume discussion about bumping min_version for Providers?
> If we don't return to the regular schedule - we will start being again
> (quoting Daniel Standish) - long term backwards compatibility engineers.
> And if we have no clear vision - like we had in Airflow 2 regarding the
> "min_version" approach—this will get worse month-by-month.
> 
> Currently we have no idea how long we will need to support a back-compat
> solution like this. This makes it difficult to make rational choices about
> the level of duplication, deprecation, or backward compatibility worth
> maintaining because we don't know the maintenance duration. Therefore,
> working out reasonable trade-offs here is nearly impossible.
> 
> If we started to apply the same rule we had in Airflow 2 (12 months since
> .0 version release) we would have:
> 
> * Today we would already have >= 3.1
> * 25th of September we would have >= 3.2
> * 7th of April 2027 we would have >= 3.3
> 
> This would mean that we would have 9 months of support for 3.2 until we
> could get rid of any back-compat.
> 
> J.
> 
> 
> On Mon, Jun 29, 2026 at 8:32 AM Stefan Wang <[email protected]> wrote:
> 
>> Thanks Amogh and Jarek:
>> 
>> +1, this makes sense and is a better approach to take. Letting the
>> operator own its execution and just subclass ResumableJobMixin is cleaner
>> than what https://github.com/apache/airflow/pull/68936 does today.
>> 
>> Right now the contract is duplicated in the task runner, and the proposed
>> gets rid of the special case instead of trying to share it.
>> 
>> The accessor is small. trigger_dag_run() would mirror the
>> ti.get_dagrun_state() we already have, hitting the same execution API
>> endpoint the runner uses today with the same token, so no new authz. It
>> basically finishes the AIP-72 migration that added DagRunTriggerException
>> as a stopgap (https://github.com/apache/airflow/pull/47882).
>> 
>> Happy to do the POC, and rework #68936 onto the accessor then link it
>> here.
>> Would the main thing be checking is back-compat? - execute() currently
>> raises on every Airflow 3 run, not just the ones that wait, so in the the
>> POC we want to prove keeping that behavior identical?
>> 
>> Best,
>> Stefan
>> 
>>> On Jun 28, 2026, at 10:38 PM, Jarek Potiuk <[email protected]> wrote:
>>> 
>>> Sounds reasonable - maybe a quick POC would be good to show how it could
>>> look like and allowed to assess if there are some back-compat concerns.
>>> 
>>> On Mon, Jun 29, 2026 at 7:27 AM Amogh Desai <[email protected]>
>> wrote:
>>> 
>>>> Now that Airflow 3.3 will introduce ResumableJobMixin to make
>> synchronous
>>>> submit and poll operators crash-safe, I wanted to start a discussion.
>>>> 
>>>> I came across https://github.com/apache/airflow/pull/68936, which
>> brings
>>>> crash recovery/durable exeucution to TriggerDagRunOperator, but it's a
>> case
>>>> which cannot use the mixin. On Airflow 3 the operator's execute() raises
>>>> *DagRunTriggerException*; the actual trigger and the wait loop run in
>> the
>>>> task runner
>>>> (_handle_trigger_dag_run). So the PR reimplements the mixin's three
>> state
>>>> contract (succeeded / reconnect / resubmit) and persist-before-poll by
>> hand
>>>> in the
>>>> runner. This means that we will now have two copies of the same contract
>>>> that can drift.
>>>> 
>>>> It cannot use the mixin's contract because it only offloads its
>> execution
>>>> to task runner, and doesn't own it. For more context, the poll primitive
>>>> already exists as a
>>>> user-callable accessor (ti.get_dagrun_state()). The only missing
>> primitive
>>>> is triggering a dag run.
>>>> 
>>>> I propose that we revisit this portion. I propose we introduce an
>> execution
>>>> API accessor in task sdk for triggering dagruns, which will be the
>>>> counterpart to the existing
>>>> ti.get_dagrun_state(). It routes through the same execution endpoint the
>>>> runner already uses, so no new authz surface is changed.
>>>> 
>>>> This proposal does not expand what task code can do, it just gives a
>> first
>>>> class way to do something already possible. A task JWT can already
>> trigger
>>>> dag runs through the
>>>> Execution API today: that is exactly what DagRunTriggerException does
>> under
>>>> the hood. The proposed *ti.trigger_dag_run()* accessor routes through
>> the
>>>> same endpoint with
>>>> the same scoped token, so the boundary is identical, just reached
>> through a
>>>> clean, supported API instead of an exception side channel.
>>>> 
>>>> Happy to hear thoughts from folks.
>>>> 
>>>> 
>>>> Thanks & Regards,
>>>> Amogh Desai
>>>> 
>> 
>> 

Reply via email to