Re: [DISCUSS] DAG Version Pinning for Deployment Gating (Building on AIP-63)

Przemysław Mirowski Thu, 23 Apr 2026 15:07:21 -0700

Hi,

I think that CI/CD and version pining are a little two different things here. 
In a use cases with some critical systems involved, the situation when the Dag 
changes the version to the latest without possibility to determine when it will 
exactly happen (CI/CD will have some more-or-less time to deploy the change, 
the same goes for Dag Processor parsing time) is rather hard to do and in some 
systems it can make change deployment harder and less safe. Of course, the 
ideal solution would be to have proper non-prod environment, which is fully 
representative in comparison to production (in some cases exposing non-prod to 
prod data/traffic/etc. is, just, not an option - e.g. security), but it is not 
always possible to do due to various reasons like costs, licenses, space and/or 
vendors. I'm agreeing especially with point 5 of Piyush latest message. Having 
above in mind, I think that version pinning would be a nice addition to the Dag 
Versioning feature with an assumption that it is for critical Airflow Dags when 
full control of the Dags version change time is required (maybe there is also 
another way to achieve that).

P.S. In my opinion, what can be done in/around git, should be done there. 
Recreation of CI/CD in any form inside of Airflow itself is something which 
should not be done.
________________________________
From: Oliveira, Niko <[email protected]>
Sent: 23 April 2026 01:50
To: [email protected] <[email protected]>
Subject: Re: [DISCUSS] DAG Version Pinning for Deployment Gating (Building on 
AIP-63)

Hey Piyush,

Thanks for your reply, I do love how clearly it is written and I see exactly 
the problem you're trying to solve!

I'm still just not convinced this needs to be done in Airflow, at least not 
with a first class feature. As interesting as I think your microservice analogy 
is, Airflow is not a microservice component, it is a (very, very) fancy cron 
scheduler. And I'm not sure the complexity is worth the use case. Since any new 
code added to Airflow must be maintained by this community and we must be 
cautious that any new pieces serves enough use cases/users to make it worth it.
To me this should either be managed outside of an individual Airflow 
environment e.g. you have an entirely separate staging/gamma/dev Airflow 
environment, which is exposed to some level of production traffic (to borrow 
your microservice analogy) until it can graduate to the production environment. 
And if you really need on the fly toggling of a version, as you say, Airflow 
does this quite responsively, if you deploy a new version of your dags it will 
parse and start using that new version immediately (the problem you're trying 
to solve can be a benefit here). You can even have multiple versions of your 
dags deployed at once and use configuration to control which dag directory 
Airflow reads from (or move/symlink Dags in and out of the Dags directory as 
needed from a known good or pinned source). Or use variables or some other 
parameter store to control other pieces of runtime behaviour inside the Dags 
themselves. Between CI/CD, dev ops and making use of existing Airflow 
primitives I think you can achieve what you're looking for.

But as always, this is open and community based software, so I'm happy to 
disagree and commit if the rest of the community thinks this is a valuable 
feature :)

Cheers,
Niko
________________________________
From: Piyush Maheshwari <[email protected]>
Sent: Tuesday, April 21, 2026 10:46 PM
To: [email protected] <[email protected]>
Subject: RE: [EXT] [DISCUSS] DAG Version Pinning for Deployment Gating 
(Building on AIP-63)

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.

AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe. Ne 
cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez pas 
confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que le 
contenu ne présente aucun risque.

Hi Ephraim, Jarek, Jens, and Niko,

Thank you for the candid feedback. I want to clarify a few things, as I
completely agree with Jens and Niko that "testing in production" is an
anti-pattern. That is absolutely not the intention here.

1. I view this as bringing standard microservice-like deployment maturity
to DAGs.
Before service deployments in our org, code is tested locally, in a dev
environment, and via strict unit/e2e integration tests before it ever makes
it to main. But even after merging and passing those CI pipelines, we still
use load tests, pre-prod soak times, shadow traffic, and gated production
rollouts with automated rollback triggers. Having deployment gates for the
production environment doesn't mean the pre-merge checks weren't strict or
that the change wasn't tested beforehand -- it just allows us to place
additional safety gates for the code to take effect, exactly like in the
service world.

2. The core issue we are trying to solve is that Airflow currently
inseparably links Code Distribution (a file arriving on the dag-processor
and being parsed) with Release Activation (the scheduler executing that
code).
To extend the microservices analogy, I can think of the DAG processor
parsing all files as "building the artifact(s)," while the scheduler and
executor acting on the DAG versions created thereafter as "deploying" or
running the changed code.
We simply want to decouple the build from the deployment. This does not
mean that the code arriving on the dag-processor will be tested for the
first time straight in production. It should've already passed a set of
checks in the CI pipeline.

3. It is also worth calling out that Airflow already supports this
decoupled behavior at the run level for task re-runs and mid-execution DAG
version bumps (by pinning the version for the rest of the execution or the
rerun). We are simply trying to expose this existing capability at the DAG
level so users can govern which version new scheduled runs are created with.

4. I also agree that Airflow itself should not be aware of our CI/CD
pipeline, nor would it manage the deployment orchestration or testing.
For our requirements, I just need Airflow to expose APIs to deploy (pin) a
DAG version, and to remove the pin (to restore/enable the default
"auto-deploy latest" behavior).
Beyond that, we intend to use an external release orchestrator that can
explicitly tell Airflow when a parsed version is actually allowed to run.
Until that API call is made, the previously pinned version remains active.
This ensures we don't introduce assumptions or awareness of the presence of
any external gating mechanisms to Airflow.
Also note that the intention is to keep the default auto-deploy behavior
unless a user (or a system on their behalf) explicitly asks Airflow to pin
a DAG to a specific version.

5. Most importantly, this feature provides an incident response "rollback"
behavior. If a bad DAG version slips through CI/CD into production, either
an on-call engineer or a rollback-trigger (airflow-external) can instantly
roll back to the previous pinned version via the API/UI to mitigate.
Without this, users have to revert the code in Git and wait for the entire
CI/CD pipeline and file-sync process to run, which is often too slow during
an outage.

6. Jarek - You are right, database schema changes can be discussed later.
My intention was only to share a very brief summary of how I deemed it to
be technically feasible for early feedback. I did briefly share the
high-level use cases ("Safe Deployment Gating" and "Instant Rollbacks") in
the original mail, but I completely agree that aligning on the UX first
would be a good next step.

If there are no major remaining concerns after this response, I can draft
and share an AIP to detail the UX, followed by a high-level proposal,
caveats and next steps.

Thanks for your time.
Regards,
Piyush

On Tue, Apr 21, 2026 at 5:59 PM Oliveira, Niko <[email protected]> wrote:

> I am with Jens on this one. I think we're complicating Airflow to get
> around a bad practice. If stability of your Dags is critical and they are
> highly versioned then I think as Jens suggested running them through a
> pipeline that first deploys them to a dev or gamma environment which
> verifies that quality of the Dags is what you expect. If something slips
> through, then it's just normal software practices of either reverting and
> rolling back or rolling forward with a fix pushed through the pipeline. I
> don't think Airflow should be aware of that process or opinionated about it.
>
> Cheers,
> Niko
> ------------------------------
> *From:* Jens Scheffler <[email protected]>
> *Sent:* Monday, April 20, 2026 11:17 AM
> *To:* [email protected] <[email protected]>
> *Subject:* RE: [EXT] [DISCUSS] DAG Version Pinning for Deployment Gating
> (Building on AIP-63)
>
> CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
>
>
> AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe.
> Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez
> pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que
> le contenu ne présente aucun risque.
>
>
>
> Hi,
>
> I am still quite sceptical. Yes, if such pinning is made, then per Dag a
> change need to be possible via UI and API. But I still see it as
> checken-and-egg - so you want to run a pinned version but then how do
> you test the changes (w/o moving a version pin)? Then again some test
> mode is needed or per run you need to make a "test run" with another
> version. Smells a bit like mis-using a production system for testing.
>
> On the other hand, yes if all Dags share the same Git repo then merging
> a branch to some other will switch all Dags at the same time. Still you
> could utilize standard Git tools and cherry-pick individual changes and
> no force to always make a full rollout. At least 80% possible with
> standard CI/CD tools and Git.
>
> TLDR I see the danger that instead of a proper CI/CD and test system
> such a feature might feel like you can easily test on a production
> system. Effectively it would be needed allowing to start a Dag with any
> version to also be able to jump back as a reversion. Even though, yes,
> agree, all is technically possible.
>
> Jens
>
> On 20.04.26 16:40, Jarek Potiuk wrote:
> > +1 to what Ephraim wrote. I think that was a natural next step we
> > discussed, but it needs significant refinement, starting with the actual
> > use cases it should serve and the UX for user interaction. I think
> related
> > database changes are pretty secondary. Use cases cover runs, re-runs,
> > backfills, CI testing, rollbacks, etc. Following the "documentation
> first"
> > approach discussed in separate thread, describing the context and
> intention
> > of what we want to achieve is much more important than DB schema changes.
> > Once we know which use cases we want to serve, the DB schema changes and
> > other related items will emerge naturally.
> >
> > On Mon, Apr 20, 2026 at 3:15 PM Ephraim Anierobi <
> [email protected]>
> > wrote:
> >
> >> Hi Piyush, thanks for starting this discussion.
> >>
> >> I like the proposal. We can introduce an active execution version for
> >> "versioned bundles" and make scheduler/API resolve through it. The hard
> >> part of this is making airflow able to distinguish the latest parsed
> >> dagmodel's metadata from active scheduling metadata. I will suggest you
> >> draft this in a google docs and share for further discussions.
> >>
> >> Regards
> >> - Ephraim
> >>
> >> On Mon, 20 Apr 2026 at 01:31, Piyush Maheshwari <[email protected]>
> >> wrote:
> >>
> >>> Thanks for sharing your thoughts Jens.
> >>>
> >>>> be able to test it? … a Q&A/Testing environment to be able to sign-off
> >>> changes.
> >>> Yes, we’ve have built an isolated airflow environment to run regression
> >>> checks before promoting to production.
> >>>
> >>> As you suggested, we’re already running both generic and DAG-custom
> >> static
> >>> checks in a CI job as a required step to merge to the main branch.
> >>>
> >>>> But then the "main" branch might be best suited if
> >>> implemented on the test system
> >>> In this case, problematic commits on “main” can choke other unrelated
> >>> changes.
> >>> So the other option would be to revert the problematic commits and
> deploy
> >>> forward.
> >>>
> >>> However, a key limitation with this approach that remains is that a
> >> commit
> >>> affecting multiple DAGs goes live for either all DAGs or none.
> >>>
> >>> Second important feature we get with this is instant DAG-level rollback
> >>> without waiting for a revert commit to merge and be picked by airflow.
> >>>
> >>> I think DAG-level version pinning can also unlock a lot of flexibility
> >> for
> >>> deployments including tiered rollouts, auto-rollback triggers, timed
> >>> deployment windows and so on.
> >>>
> >>> Looking forward to hear your thoughts.
> >>> Regards,
> >>> Piyush
> >>>
> >>> On Sun, 19 Apr 2026 at 3:12 PM, Jens Scheffler <[email protected]>
> >>> wrote:
> >>>
> >>>> Thanks Piyush for dropping the discussion!
> >>>>
> >>>> I think in general QA processes are important and a valid use case. So
> >> a
> >>>> kind of pinning Dag versions really is important.
> >>>>
> >>>> Thinking about it, if you pin the version ... how would you then be
> >> able
> >>>> to test it? I assume you would need (and should have or invest into) a
> >>>> Q&A/Testing environment to be able to sign-off changes. Both in
> >>>> infrastructure but also for Dag changes.
> >>>>
> >>>> If you are changing Dags first of all static checks on Dag code are
> >> very
> >>>> much proposed as well as you can have tests implemented and test your
> >>>> Dags and logic. Similar like software a CI/CD system will be a good
> >>>> setup. Alongside Dag changes also have logical changes that mostly can
> >>>> only be tested in a live system and not as static checks.
> >>>>
> >>>> Have you considered using Git and a set of branches for implementing
> >>>> such staging? E.g. you have a git repo and you plan to make changes.
> >>>> Then you would open a PR for the change and merge it to the "main"
> >>>> branch - and there in your CI/CD you can check all sorts of static
> >>>> checks and tests. But then the "main" branch might be best suited if
> >>>> implemented on the test system. Once you validate the changes
> >> end-to-end
> >>>> you could make another PR for example to a "prod" branch. And if your
> >>>> production system is only pulling Dags from the "prod" branch then you
> >>>> can have this merging strategy as a staging setup.
> >>>>
> >>>> Would this resolve your PING problem? Or which other detail in the use
> >>>> case would require a PIN on top of a staging strategy?
> >>>>
> >>>> Jens
> >>>>
> >>>> P.S.: Have enabled your confluence account after it was created in
> >> order
> >>>> to write to Confluence, sorry, typical pitfall after account creation
> >>>> permissions were not set. Now it should work. Let me know if not.
> >>>>
> >>>> On 19.04.26 01:40, Piyush Maheshwari wrote:
> >>>>> Hi everyone,
> >>>>> I'm a new contributor to Airflow. I'd like to propose a new feature
> >> for
> >>>> Airflow: DAG Version Pinning.
> >>>>> Building on the foundation introduced by AIP-63: DAG Versioning (
> >>
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-63%3A+DAG+Versioning
> >>> ),
> >>>> this proposal aims to extend Airflow's capabilities to support true
> >>>> continuous deployment (CD) gating and safer release cycles.
> >>>>> The Problem & Use Cases
> >>>>> Currently, the scheduler always creates DagRuns using the latest
> >> parsed
> >>>> DagVersion. This means that the updated DAG code is deployed (takes
> >>> effect)
> >>>> right after the dag-processor processes it. While this is great for
> >> rapid
> >>>> development, teams running business-critical pipelines often need
> >>> stricter
> >>>> deployment mechanisms. Specifically:
> >>>>>     *
> >>>>> Safe Deployment Gating: The ability to pin a DAG to its last known
> >>>> stable version while new code is parsed in the background. This allows
> >>> the
> >>>> new version to be held back until it passes automated regression tests
> >> or
> >>>> receives explicit manual approval.
> >>>>>     *
> >>>>> Instant Rollbacks: If an issue is detected in a newly promoted DAG
> >>>> version, users need the capability to instantly roll back to a
> previous
> >>>> version via the UI/API, without having to revert the underlying code
> >> and
> >>>> wait for the repository sync and DAG processing cycle.
> >>>>> High-Level Proposed Solution
> >>>>> Introduce an optional active_dag_version_id to the DagModel. This
> >> field
> >>>> can be used to pin a DAG version for scheduling and execution, while
> >> the
> >>>> dag-processor can continue to parse and register newer DAG versions.
> >>>>>     *
> >>>>> When this pin is set, the scheduler and API will respect the pinned
> >>>> version for creating runs and executing tasks, separating the parsing
> >> of
> >>>> new code from the execution of new code.
> >>>>>     *
> >>>>> If the pin is NULL, the system defaults to the current behavior
> >> (always
> >>>> executing the latest parsed version). This way, we can maintain
> >> complete
> >>>> backwards compatibility.
> >>>>> I have put together some detailed notes covering the data model
> >>> changes,
> >>>> database migrations, and edge cases with this approach. If there is
> >>> general
> >>>> alignment that this fits the vision for Airflow, I would like to take
> >>> this
> >>>> proposal through the formal AIP review process.
> >>>>> But I would love to get the community's feedback on the feature and
> >> the
> >>>> high-level approach.
> >>>>> I'll also need someone to grant me access to create content on the
> >>>> Airflow Confluence wiki.
> >>>>> Thanks for your time!
> >>>>> Regards,
> >>>>> Piyush
> >>>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: [email protected]
> >>>> For additional commands, e-mail: [email protected]
> >>>>
> >>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: [DISCUSS] DAG Version Pinning for Deployment Gating (Building on AIP-63)

Reply via email to