Hi all,

We have been working through a Helm chart refurbishment effort over the
past few months. The goal is to keep 1.2x stable for existing users while
preparing a cleaner next major release. I would like to share where we have
landed and open it up for feedback before going further.

*Branching strategy*

We created chart/v1-2x-test, mirroring how v3-1-test works for Airflow
itself.

   -

   chart/v1-2x-test is the maintenance line. Bug fixes and stability work
   for 1.2x releases land here.
   -

   main is for cleanup, deprecations, and preparation toward 2.0.

The split was deliberate. We wanted to give existing 1.x users a smooth
transition path without holding back the 2.0 work, and the same the other
way around. 2.0 is intended as a real refurbish rather than an incremental
bump. It will carry a fair number of breaking changes, but the upside is
that it gives users a clean starting point with a chart fully designed
around Airflow 3 and what comes after, instead of one carrying years of
accumulated assumptions from the 1.x line. Existing users on 1.2x are not
forced into the move, which the maintenance branch is keeping shipping for
them, but anyone starting fresh or willing to migrate gets a much simpler
chart to work with.

We have already cut and released 1.21.0 from chart/v1-2x-test, so the model
is in place rather than hypothetical. The release went through cleanly and
gave us the separation we were after, which is part of the reason the
proposal feels concrete enough to bring here.

*Kustomize direction*

A recurring theme in our discussions has been that the chart carries a fair
amount of components that are not Airflow-native. Kerberos, Elasticsearch
logging, gitSync, and PostgreSQL are good examples. They make the chart
heavier than it needs to be and pull us toward maintaining things that
already have external owners.

The proposal is to express these as Kustomize overlays that sit alongside
the chart as a guide for users, not as released chart artifacts.

*Confirmed for Kustomize*

   -

   Kerberos: Authentication variant, environment-specific, sidecar injection
   -

   gitSync: DAG delivery mechanism, orthogonal to Airflow runtime
   -

   Elasticsearch: External logging backend, not Airflow-native
   -

   PostgreSQL: Can be expressed as plain Kubernetes resources

PgBouncer and StatsD are also candidates but we want to investigate them
further before committing. They will not be in the first round of overlays.

*Structure*

Overlays live in the repository but are not part of the chart release
artifact. Each overlay has a kustomization.yaml, the resources it produces,
and a STATUS file marking whether it is verified in CI or a starting point
that users can extend.

A rough sketch of how it would look in the repo:


 ```
  chart/



    kustomize-overlays/


      README.rst
      CONTRIBUTING.rst
      keda/


        kustomization.yaml
        scaledobject.yaml


        STATUS


      kerberos/
        kustomization.yaml
        scheduler-sidecar-patch.yaml
        STATUS

```



We will start with a PoC before agreeing on the broader rollout. HPA or
KEDA covers the standalone addition pattern to go first or second. Kerberos
covers the post-render patch pattern and becomes the template for any
future sidecar injection use case. We are putting together a first PoC now
and will share it in this thread once it is in a shape worth looking at, so
the discussion has something concrete to sit alongside the criteria below.

*Lifecycle*

The lifecycle mirrors how providers work, just on a smaller scale.

   -

   A new overlay is proposed via a PR and lands with STATUS: not-tested.
   -

   The contributor follows up with a test at chart/tests/kustomize/test_.py
   and flips STATUS to tested, either in the same PR or a focused
   follow-up. Equally, there can be smoke test on CI to test the flow of
   Kustomize overlays, which can be a technical detail of the process flow.
   -

   An overlay is deprecated by setting deprecated: true in STATUS along
   with a short message pointing to the replacement.
   -

   Deprecated overlays stay around for one major chart version before they
   are removed, so users always have a window to migrate.

CONTRIBUTING.rst in the overlays directory is the authoritative reference
for all of this, criteria, the exception process, status conventions, and
the migration guide pattern live there together.

*Criteria for chart vs Kustomize*

The criteria will live at chart/kustomize-overlays/CONTRIBUTING.rst.

Belongs in the chart (all must be true):

   -

   Required to run Airflow (scheduler, API server, dag-processor,
   triggerer, workers)
   -

   Removing it requires changes to Airflow's own configuration
   -

   No external owner

Belongs in Kustomize (any may be true):

   -

   Can be expressed as a standalone Kubernetes resource without modifying
   chart-rendered resources
   -

   Environment-specific (authentication schemes, logging backends,
   autoscaling controllers)
   -

   Has an external owner (KEDA, Elasticsearch, any PostgreSQL distribution)
   -

   Requires CRDs that the chart does not install

One invariant we want to keep is that the chart never removes a component
without a working overlay already in place. Users should always have a
migration path before anything disappears.

*Thoughts welcome*

The branching split is in place because we wanted the transition to 2.0 to
be smooth for users, with 1.2x continuing to ship in parallel. Sharing it
here so the rest of the proposal sits in the right context.

What I would love to hear thoughts on:

   -

   Does the chart vs Kustomize criteria hold up against the deployments you
   have run? Anything that feels off, missing, or too strict.
   -

   Anything in the confirmed component list you would push back on, or
   anything you think should be added.

If you would rather leave longer notes on the Confluence page or the Google
Doc we have been working from, those are equally welcome. Links below.

*References*

   -

   Confluence:
   https://cwiki.apache.org/confluence/display/AIRFLOW/Helm+Refurbish
   -

   Discussion notes (Google Doc):
   
https://docs.google.com/document/d/1bZsyrG5kjsYd2rJRiN3kR613lO6JPEBd4ItsySneOMw/edit?usp=sharing
   -

   Umbrella issue: https://github.com/apache/airflow/issues/64037

Thanks,

Bugra Ozturk

Kind regards,

Reply via email to