Hi Bugra,

I really like the PoC and its direction. While I was involved in defining
the broad approach and created an early Claude-coded PoC, that version was
far too complex, opinionated, and large.

Seeing this simple PR—which combines minimal code with architectural and
user documentation—makes this discussion much more meaningful and informed.
This perfectly demonstrates the power of a
"documentation-in-parallel-to-code" approach.

Best,
Jarek

On Mon, Apr 27, 2026 at 9:29 AM Zhe-You(Jason) Liu <[email protected]>
wrote:

> Hi Bugra,
>
> Thanks for raising the Kustomize discussion. I haven't gone through the doc
> thoroughly yet, but just FYI, here is some context I have regarding the
> Kustomize approach. This might be helpful in coming up with a final
> structure that fits all the use cases we need to support while ensuring
> good long-term maintenance.
>
> For example:
> - Add optional OTel service to the Airflow Helm Chart #64902 [1]
> - Helm chart support for periodic API server rollout restarts on Kubernetes
> #61432 [2]
>
> Additionally, there is a Slack thread discussing the Kustomize direction
> [3].
>
> [1] https://github.com/apache/airflow/pull/64902#issuecomment-4206639363
> [2] https://github.com/apache/airflow/pull/61636#issuecomment-3881992323
> [3]
> https://apache-airflow.slack.com/archives/C027H098M1C/p1770794021001679
>
> Best,
> Jason
>
> On Mon, Apr 27, 2026 at 8:16 AM Buğra Öztürk <[email protected]>
> wrote:
>
> >  Hi all,
> >
> > I have started working on the PoC for the Kustomize direction as
> mentioned
> > in the thread for KEDA.
> >
> > Here is what I am thinking for the approach to make this stable and
> faster
> > for further iterations. It is to align with the fundamentals before
> > building further. Smaller increments should make reviews easier and allow
> > for quicker course correction. Once the foundation is in place, the
> > remaining work should move faster.
> >
> > * Share the directory structure in this first PoC example (not fully
> tested
> > yet), with CI/pre-commit checks focusing only on validating the agreed
> > structure
> >
> > * Collect feedback, review, and merge the shared PR
> >
> > * Propose and build a smoke test on top of the KEDA overlay in a separate
> > new PR
> >
> > *  Collect feedback, review, and merge the smoke test PR
> >
> > * Test locally to check if smoke tests match
> >
> > * Move KEDA overlay to testing in a new PR with the introduction of a
> > deprecation warning
> >
> > PR: https://github.com/apache/airflow/pull/65897
> >
> > Thoughts and early feedback very welcome.
> >
> > Are we going to go over these in every overlay addition?
> > Short answer, no.
> > Long answer, this is early maturity frictions and making step-by-step
> will
> > make new overlay additions without too much hassle. I hope that an
> agreed,
> > tested, documented approach will make the next additions in one go in a
> > single PR :)
> >
> >
> > Kind regards,
> > Bugra Ozturk
> >
> > On Sat, Apr 25, 2026 at 5:37 PM Buğra Öztürk <[email protected]>
> > wrote:
> >
> > > Sorry for the formatting of the directory structure! In the mail app,
> it
> > > looked fine. You can find that specifically in Google Docs as well
> > >
> > >
> >
> https://docs.google.com/document/d/1bZsyrG5kjsYd2rJRiN3kR613lO6JPEBd4ItsySneOMw/edit?tab=t.cv476feyrxmf
> > >
> > > On Sat, Apr 25, 2026 at 5:31 PM Buğra Öztürk <[email protected]>
> > > wrote:
> > >
> > >> Hi all,
> > >>
> > >> We have been working through a Helm chart refurbishment effort over
> the
> > >> past few months. The goal is to keep 1.2x stable for existing users
> > while
> > >> preparing a cleaner next major release. I would like to share where we
> > have
> > >> landed and open it up for feedback before going further.
> > >>
> > >> *Branching strategy*
> > >>
> > >> We created chart/v1-2x-test, mirroring how v3-1-test works for Airflow
> > >> itself.
> > >>
> > >>    -
> > >>
> > >>    chart/v1-2x-test is the maintenance line. Bug fixes and stability
> > >>    work for 1.2x releases land here.
> > >>    -
> > >>
> > >>    main is for cleanup, deprecations, and preparation toward 2.0.
> > >>
> > >> The split was deliberate. We wanted to give existing 1.x users a
> smooth
> > >> transition path without holding back the 2.0 work, and the same the
> > other
> > >> way around. 2.0 is intended as a real refurbish rather than an
> > incremental
> > >> bump. It will carry a fair number of breaking changes, but the upside
> is
> > >> that it gives users a clean starting point with a chart fully designed
> > >> around Airflow 3 and what comes after, instead of one carrying years
> of
> > >> accumulated assumptions from the 1.x line. Existing users on 1.2x are
> > not
> > >> forced into the move, which the maintenance branch is keeping shipping
> > for
> > >> them, but anyone starting fresh or willing to migrate gets a much
> > simpler
> > >> chart to work with.
> > >>
> > >> We have already cut and released 1.21.0 from chart/v1-2x-test, so the
> > >> model is in place rather than hypothetical. The release went through
> > >> cleanly and gave us the separation we were after, which is part of the
> > >> reason the proposal feels concrete enough to bring here.
> > >>
> > >> *Kustomize direction*
> > >>
> > >> A recurring theme in our discussions has been that the chart carries a
> > >> fair amount of components that are not Airflow-native. Kerberos,
> > >> Elasticsearch logging, gitSync, and PostgreSQL are good examples. They
> > make
> > >> the chart heavier than it needs to be and pull us toward maintaining
> > things
> > >> that already have external owners.
> > >>
> > >> The proposal is to express these as Kustomize overlays that sit
> > alongside
> > >> the chart as a guide for users, not as released chart artifacts.
> > >>
> > >> *Confirmed for Kustomize*
> > >>
> > >>    -
> > >>
> > >>    Kerberos: Authentication variant, environment-specific, sidecar
> > >>    injection
> > >>    -
> > >>
> > >>    gitSync: DAG delivery mechanism, orthogonal to Airflow runtime
> > >>    -
> > >>
> > >>    Elasticsearch: External logging backend, not Airflow-native
> > >>    -
> > >>
> > >>    PostgreSQL: Can be expressed as plain Kubernetes resources
> > >>
> > >> PgBouncer and StatsD are also candidates but we want to investigate
> them
> > >> further before committing. They will not be in the first round of
> > overlays.
> > >>
> > >> *Structure*
> > >>
> > >> Overlays live in the repository but are not part of the chart release
> > >> artifact. Each overlay has a kustomization.yaml, the resources it
> > produces,
> > >> and a STATUS file marking whether it is verified in CI or a starting
> > point
> > >> that users can extend.
> > >>
> > >> A rough sketch of how it would look in the repo:
> > >>
> > >>
> > >>
> > >>  ```
> > >>   chart/
> > >>
> > >>
> > >>
> > >>     kustomize-overlays/
> > >>
> > >>
> > >>       README.rst
> > >>       CONTRIBUTING.rst
> > >>       keda/
> > >>
> > >>
> > >>
> > >>         kustomization.yaml
> > >>         scaledobject.yaml
> > >>
> > >>
> > >>
> > >>         STATUS
> > >>
> > >>
> > >>
> > >>       kerberos/
> > >>         kustomization.yaml
> > >>         scheduler-sidecar-patch.yaml
> > >>         STATUS
> > >>
> > >> ```
> > >>
> > >>
> > >>
> > >> We will start with a PoC before agreeing on the broader rollout. HPA
> or
> > >> KEDA covers the standalone addition pattern to go first or second.
> > Kerberos
> > >> covers the post-render patch pattern and becomes the template for any
> > >> future sidecar injection use case. We are putting together a first PoC
> > now
> > >> and will share it in this thread once it is in a shape worth looking
> > at, so
> > >> the discussion has something concrete to sit alongside the criteria
> > below.
> > >>
> > >> *Lifecycle*
> > >>
> > >> The lifecycle mirrors how providers work, just on a smaller scale.
> > >>
> > >>    -
> > >>
> > >>    A new overlay is proposed via a PR and lands with STATUS:
> not-tested.
> > >>    -
> > >>
> > >>    The contributor follows up with a test at
> > >>    chart/tests/kustomize/test_.py and flips STATUS to tested, either
> in
> > >>    the same PR or a focused follow-up. Equally, there can be smoke
> test
> > >>    on CI to test the flow of Kustomize overlays, which can be a
> > technical
> > >>    detail of the process flow.
> > >>    -
> > >>
> > >>    An overlay is deprecated by setting deprecated: true in STATUS
> along
> > >>    with a short message pointing to the replacement.
> > >>    -
> > >>
> > >>    Deprecated overlays stay around for one major chart version before
> > >>    they are removed, so users always have a window to migrate.
> > >>
> > >> CONTRIBUTING.rst in the overlays directory is the authoritative
> > reference
> > >> for all of this, criteria, the exception process, status conventions,
> > and
> > >> the migration guide pattern live there together.
> > >>
> > >> *Criteria for chart vs Kustomize*
> > >>
> > >> The criteria will live at chart/kustomize-overlays/CONTRIBUTING.rst.
> > >>
> > >> Belongs in the chart (all must be true):
> > >>
> > >>    -
> > >>
> > >>    Required to run Airflow (scheduler, API server, dag-processor,
> > >>    triggerer, workers)
> > >>    -
> > >>
> > >>    Removing it requires changes to Airflow's own configuration
> > >>    -
> > >>
> > >>    No external owner
> > >>
> > >> Belongs in Kustomize (any may be true):
> > >>
> > >>    -
> > >>
> > >>    Can be expressed as a standalone Kubernetes resource without
> > >>    modifying chart-rendered resources
> > >>    -
> > >>
> > >>    Environment-specific (authentication schemes, logging backends,
> > >>    autoscaling controllers)
> > >>    -
> > >>
> > >>    Has an external owner (KEDA, Elasticsearch, any PostgreSQL
> > >>    distribution)
> > >>    -
> > >>
> > >>    Requires CRDs that the chart does not install
> > >>
> > >> One invariant we want to keep is that the chart never removes a
> > component
> > >> without a working overlay already in place. Users should always have a
> > >> migration path before anything disappears.
> > >>
> > >> *Thoughts welcome*
> > >>
> > >> The branching split is in place because we wanted the transition to
> 2.0
> > >> to be smooth for users, with 1.2x continuing to ship in parallel.
> > Sharing
> > >> it here so the rest of the proposal sits in the right context.
> > >>
> > >> What I would love to hear thoughts on:
> > >>
> > >>    -
> > >>
> > >>    Does the chart vs Kustomize criteria hold up against the
> deployments
> > >>    you have run? Anything that feels off, missing, or too strict.
> > >>    -
> > >>
> > >>    Anything in the confirmed component list you would push back on, or
> > >>    anything you think should be added.
> > >>
> > >> If you would rather leave longer notes on the Confluence page or the
> > >> Google Doc we have been working from, those are equally welcome. Links
> > >> below.
> > >>
> > >> *References*
> > >>
> > >>    -
> > >>
> > >>    Confluence:
> > >>    https://cwiki.apache.org/confluence/display/AIRFLOW/Helm+Refurbish
> > >>    -
> > >>
> > >>    Discussion notes (Google Doc):
> > >>
> >
> https://docs.google.com/document/d/1bZsyrG5kjsYd2rJRiN3kR613lO6JPEBd4ItsySneOMw/edit?usp=sharing
> > >>    -
> > >>
> > >>    Umbrella issue: https://github.com/apache/airflow/issues/64037
> > >>
> > >> Thanks,
> > >>
> > >> Bugra Ozturk
> > >>
> > >> Kind regards,
> > >>
> > >
> >
>

Reply via email to