Hi Bugra, Thanks for raising the Kustomize discussion. I haven't gone through the doc thoroughly yet, but just FYI, here is some context I have regarding the Kustomize approach. This might be helpful in coming up with a final structure that fits all the use cases we need to support while ensuring good long-term maintenance.
For example: - Add optional OTel service to the Airflow Helm Chart #64902 [1] - Helm chart support for periodic API server rollout restarts on Kubernetes #61432 [2] Additionally, there is a Slack thread discussing the Kustomize direction [3]. [1] https://github.com/apache/airflow/pull/64902#issuecomment-4206639363 [2] https://github.com/apache/airflow/pull/61636#issuecomment-3881992323 [3] https://apache-airflow.slack.com/archives/C027H098M1C/p1770794021001679 Best, Jason On Mon, Apr 27, 2026 at 8:16 AM Buğra Öztürk <[email protected]> wrote: > Hi all, > > I have started working on the PoC for the Kustomize direction as mentioned > in the thread for KEDA. > > Here is what I am thinking for the approach to make this stable and faster > for further iterations. It is to align with the fundamentals before > building further. Smaller increments should make reviews easier and allow > for quicker course correction. Once the foundation is in place, the > remaining work should move faster. > > * Share the directory structure in this first PoC example (not fully tested > yet), with CI/pre-commit checks focusing only on validating the agreed > structure > > * Collect feedback, review, and merge the shared PR > > * Propose and build a smoke test on top of the KEDA overlay in a separate > new PR > > * Collect feedback, review, and merge the smoke test PR > > * Test locally to check if smoke tests match > > * Move KEDA overlay to testing in a new PR with the introduction of a > deprecation warning > > PR: https://github.com/apache/airflow/pull/65897 > > Thoughts and early feedback very welcome. > > Are we going to go over these in every overlay addition? > Short answer, no. > Long answer, this is early maturity frictions and making step-by-step will > make new overlay additions without too much hassle. I hope that an agreed, > tested, documented approach will make the next additions in one go in a > single PR :) > > > Kind regards, > Bugra Ozturk > > On Sat, Apr 25, 2026 at 5:37 PM Buğra Öztürk <[email protected]> > wrote: > > > Sorry for the formatting of the directory structure! In the mail app, it > > looked fine. You can find that specifically in Google Docs as well > > > > > https://docs.google.com/document/d/1bZsyrG5kjsYd2rJRiN3kR613lO6JPEBd4ItsySneOMw/edit?tab=t.cv476feyrxmf > > > > On Sat, Apr 25, 2026 at 5:31 PM Buğra Öztürk <[email protected]> > > wrote: > > > >> Hi all, > >> > >> We have been working through a Helm chart refurbishment effort over the > >> past few months. The goal is to keep 1.2x stable for existing users > while > >> preparing a cleaner next major release. I would like to share where we > have > >> landed and open it up for feedback before going further. > >> > >> *Branching strategy* > >> > >> We created chart/v1-2x-test, mirroring how v3-1-test works for Airflow > >> itself. > >> > >> - > >> > >> chart/v1-2x-test is the maintenance line. Bug fixes and stability > >> work for 1.2x releases land here. > >> - > >> > >> main is for cleanup, deprecations, and preparation toward 2.0. > >> > >> The split was deliberate. We wanted to give existing 1.x users a smooth > >> transition path without holding back the 2.0 work, and the same the > other > >> way around. 2.0 is intended as a real refurbish rather than an > incremental > >> bump. It will carry a fair number of breaking changes, but the upside is > >> that it gives users a clean starting point with a chart fully designed > >> around Airflow 3 and what comes after, instead of one carrying years of > >> accumulated assumptions from the 1.x line. Existing users on 1.2x are > not > >> forced into the move, which the maintenance branch is keeping shipping > for > >> them, but anyone starting fresh or willing to migrate gets a much > simpler > >> chart to work with. > >> > >> We have already cut and released 1.21.0 from chart/v1-2x-test, so the > >> model is in place rather than hypothetical. The release went through > >> cleanly and gave us the separation we were after, which is part of the > >> reason the proposal feels concrete enough to bring here. > >> > >> *Kustomize direction* > >> > >> A recurring theme in our discussions has been that the chart carries a > >> fair amount of components that are not Airflow-native. Kerberos, > >> Elasticsearch logging, gitSync, and PostgreSQL are good examples. They > make > >> the chart heavier than it needs to be and pull us toward maintaining > things > >> that already have external owners. > >> > >> The proposal is to express these as Kustomize overlays that sit > alongside > >> the chart as a guide for users, not as released chart artifacts. > >> > >> *Confirmed for Kustomize* > >> > >> - > >> > >> Kerberos: Authentication variant, environment-specific, sidecar > >> injection > >> - > >> > >> gitSync: DAG delivery mechanism, orthogonal to Airflow runtime > >> - > >> > >> Elasticsearch: External logging backend, not Airflow-native > >> - > >> > >> PostgreSQL: Can be expressed as plain Kubernetes resources > >> > >> PgBouncer and StatsD are also candidates but we want to investigate them > >> further before committing. They will not be in the first round of > overlays. > >> > >> *Structure* > >> > >> Overlays live in the repository but are not part of the chart release > >> artifact. Each overlay has a kustomization.yaml, the resources it > produces, > >> and a STATUS file marking whether it is verified in CI or a starting > point > >> that users can extend. > >> > >> A rough sketch of how it would look in the repo: > >> > >> > >> > >> ``` > >> chart/ > >> > >> > >> > >> kustomize-overlays/ > >> > >> > >> README.rst > >> CONTRIBUTING.rst > >> keda/ > >> > >> > >> > >> kustomization.yaml > >> scaledobject.yaml > >> > >> > >> > >> STATUS > >> > >> > >> > >> kerberos/ > >> kustomization.yaml > >> scheduler-sidecar-patch.yaml > >> STATUS > >> > >> ``` > >> > >> > >> > >> We will start with a PoC before agreeing on the broader rollout. HPA or > >> KEDA covers the standalone addition pattern to go first or second. > Kerberos > >> covers the post-render patch pattern and becomes the template for any > >> future sidecar injection use case. We are putting together a first PoC > now > >> and will share it in this thread once it is in a shape worth looking > at, so > >> the discussion has something concrete to sit alongside the criteria > below. > >> > >> *Lifecycle* > >> > >> The lifecycle mirrors how providers work, just on a smaller scale. > >> > >> - > >> > >> A new overlay is proposed via a PR and lands with STATUS: not-tested. > >> - > >> > >> The contributor follows up with a test at > >> chart/tests/kustomize/test_.py and flips STATUS to tested, either in > >> the same PR or a focused follow-up. Equally, there can be smoke test > >> on CI to test the flow of Kustomize overlays, which can be a > technical > >> detail of the process flow. > >> - > >> > >> An overlay is deprecated by setting deprecated: true in STATUS along > >> with a short message pointing to the replacement. > >> - > >> > >> Deprecated overlays stay around for one major chart version before > >> they are removed, so users always have a window to migrate. > >> > >> CONTRIBUTING.rst in the overlays directory is the authoritative > reference > >> for all of this, criteria, the exception process, status conventions, > and > >> the migration guide pattern live there together. > >> > >> *Criteria for chart vs Kustomize* > >> > >> The criteria will live at chart/kustomize-overlays/CONTRIBUTING.rst. > >> > >> Belongs in the chart (all must be true): > >> > >> - > >> > >> Required to run Airflow (scheduler, API server, dag-processor, > >> triggerer, workers) > >> - > >> > >> Removing it requires changes to Airflow's own configuration > >> - > >> > >> No external owner > >> > >> Belongs in Kustomize (any may be true): > >> > >> - > >> > >> Can be expressed as a standalone Kubernetes resource without > >> modifying chart-rendered resources > >> - > >> > >> Environment-specific (authentication schemes, logging backends, > >> autoscaling controllers) > >> - > >> > >> Has an external owner (KEDA, Elasticsearch, any PostgreSQL > >> distribution) > >> - > >> > >> Requires CRDs that the chart does not install > >> > >> One invariant we want to keep is that the chart never removes a > component > >> without a working overlay already in place. Users should always have a > >> migration path before anything disappears. > >> > >> *Thoughts welcome* > >> > >> The branching split is in place because we wanted the transition to 2.0 > >> to be smooth for users, with 1.2x continuing to ship in parallel. > Sharing > >> it here so the rest of the proposal sits in the right context. > >> > >> What I would love to hear thoughts on: > >> > >> - > >> > >> Does the chart vs Kustomize criteria hold up against the deployments > >> you have run? Anything that feels off, missing, or too strict. > >> - > >> > >> Anything in the confirmed component list you would push back on, or > >> anything you think should be added. > >> > >> If you would rather leave longer notes on the Confluence page or the > >> Google Doc we have been working from, those are equally welcome. Links > >> below. > >> > >> *References* > >> > >> - > >> > >> Confluence: > >> https://cwiki.apache.org/confluence/display/AIRFLOW/Helm+Refurbish > >> - > >> > >> Discussion notes (Google Doc): > >> > https://docs.google.com/document/d/1bZsyrG5kjsYd2rJRiN3kR613lO6JPEBd4ItsySneOMw/edit?usp=sharing > >> - > >> > >> Umbrella issue: https://github.com/apache/airflow/issues/64037 > >> > >> Thanks, > >> > >> Bugra Ozturk > >> > >> Kind regards, > >> > > >
