OK. I spent some time thinking about it, I had to go back to the
drawing board - which required me to clean my head of many of the previous
assumptions and I think I figured out how we can provide "9+%" of what - at
least I heard from the users who asked for it without adding the complexity
that ripples through the whole airflow DB design.

*Cleaned assumptions:*

I think what the users **really** want is:

1) first of all (and when I think about it now - this is more important) -
to give the ability to control who of the dag authors have access to which
Connections and Variables (hence our proposal was to add team_id to those
in the DB). Especially connections because their credentials are the most
sensitive part of Airflow installation.
2) have a group of DAGs use different set of dependencies installed when
the DAGs are executed (most important) but also parsed (due to the way how
in Airflow DAG parsing and task definition are all in single Python files).
3) have a way to limit access to see and monitor and operate on the DAGs
via UI by different users (but this is far less important)
4) have a way to isolate workloads executed in Celery executor especially -
so that tasks from a single "team" are not executed in the same
machine/interpreter as the tasks for another team.

*Airflow 3.1+ environment and my new thinking*

With the few AIPs implemented (not yet fully and we might build on top of
those) I think we might approach to solve the problems above a bit
differently than having separate deployments of workers, triggers and dag
processors (which as Ash pointed out is almost as complex as having whole
Airflow deployment - or actually even more complex, because complexity of
the whole airflow increases. So I put the question - can we leverage the
currently implemented AIPs and changes in Airflow 3 to achieve those three
goals without changes rippling through Airflow DB. I think we can.

Instead of introducing a team id concept, we could leverage AIP-67 - Dag
Bundles and parsing to achieve pretty much the same. While currently DAG
Bundles were designed to just provide different backends to retrieve them
from, the initial concept and more complete version of those included
"packaging" which could also be connected with "execution and parsing
environment". I think we could treat "bundle_id" as "team_id". And if we do
it and implement packaging and execution environments (say ability of
choosing predefined venv to parse and execute DAGs coming from a specific
bundle_id - the expectation 2) above can be handled well.

The same bundle id can be used to namespace connections and variables.
There are several ways we can achieve that, but I believe we could now with
Task SDK be cryptographically sure (with JWT tokens) sure that the requests
to api server are coming from specific dag and find the bundle it comes
from (even if not currently we can add specific claims to the tokens that
we can verify). Which means that the API server can know which bundle id
the DAG comes from when it asks for a connection and variables. Which means
that we could extend Bundle definition and either explicitly - listing the
connections and variables or implicitly - for example by prefix or adding
just "bundle_id" to connections and variables - limit which connections can
be retrieved when the task is being executed. That will achieve 1)

Same bundle_id could be passed as one of the auth manager "fields" instead
of team_id - and make it the base for writing "bundle_id" aware auth
managers (addressing 3) )

Similarly we could link bundle_id with celery queue_id and "hard link"
those - i.e. do not allow to choose queue id to a different set of queues
than the one defined for the bundle the dag comes from. That would allow
the users to configure separate queues (and workers) for each DAG bundle.

I think with this approach all 4 points above can be achieved this way
without a ripple effect on Airflow DB and complicating Airflow deployment.

J.




On Thu, Jun 12, 2025 at 6:34 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> > As far as I can work out, and again, please correct me if I'm wrong, the
> only real difference to users fro the multi-team solution over running
> multiple Airflows is the ability to “communicate” via Datasets/Assets.
> (Variables aren’t shared, Connections aren’t shared, workers aren’t shared.
> Webserver and Scheduler could be/are shared but reducing resource
> consumption of a deployment is explicitly not a goal)
>
> One other thing is the execution environment - i.e. set of dependencies
> used to parse and execute Airflow DAGs. Which of course will become less of
> a problem when we have task-sdk full separation but still is a concern of
> users.
>
> J.
>
>
> On Thu, Jun 12, 2025 at 5:51 PM Ash Berlin-Taylor <a...@apache.org> wrote:
>
>> > Other than the feature being a consistent request on our Airflow
>> surveys, we have a number of users that have asked and continue to ask when
>> a multi-team solution would be available in Airflow
>>
>> This is precisely one of my points. I don’t believe that AIP-67 as in the
>> wiki page will address the need of many users asking for multi team. (Ask
>> three Airflow users what they want from multi team, and you’ll get 5
>> different answers)
>>
>> As far as I can work out, and again, please correct me if I'm wrong, the
>> only real difference to users fro the multi-team solution over running
>> multiple Airflows is the ability to “communicate” via Datasets/Assets.
>> (Variables aren’t shared, Connections aren’t shared, workers aren’t shared.
>> Webserver and Scheduler could be/are shared but reducing resource
>> consumption of a deployment is explicitly not a goal)
>>
>> And if that is really all this AIP delivers to us, then my hypothesis is
>> that we’ll a) miss the mark on what many users actually want from
>> multi-team, and b) that we could already get the “communicate between two
>> teams DAGs” benfit today with no changes to Airflow by using the
>> AssetWatcher that Vincent already added to 3.0.0.
>>
>> -ash
>>
>>
>> > On 12 Jun 2025, at 16:24, Bishundeo, Rajeshwar
>> <rbish...@amazon.com.INVALID> wrote:
>> >
>> > Ash, you've raised some good points on the need to re-evaluate AIP-67,
>> although I'm a bit confused on how AIP-82 factors into a multi-team
>> solution. It's fair to have the discussion on how Airflow has changed and
>> perhaps either redefining what AIP-67 means...or a set of new AIP's solving
>> a subset of a larger need.
>> > We have seen talks at previous ( and even at the upcoming summit) where
>> users have demonstrated their implementation of multi-team. I can't help
>> but feel that they are being creative with some of those solutions (not
>> that there's anything wrong with that), but because one doesn't exist in
>> Airflow. Other than the feature being a consistent request on our Airflow
>> surveys, we have a number of users that have asked and continue to ask when
>> a multi-team solution would be available in Airflow.
>> > I think the next dev call (06/26) is great time to dive into this
>> further.
>> >
>> > -- Rajesh
>> >
>> >
>> >
>> >
>> >
>> >
>> > On 2025-06-12, 9:34 AM, "Jarek Potiuk" <ja...@potiuk.com <mailto:
>> ja...@potiuk.com> <mailto:ja...@potiuk.com>> wrote:
>> >
>> >
>> > CAUTION: This email originated from outside of the organization. Do not
>> click links or open attachments unless you can confirm the sender and know
>> the content is safe.
>> >
>> >
>> >
>> >
>> >
>> >
>> > AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur
>> externe. Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous
>> ne pouvez pas confirmer l’identité de l’expéditeur et si vous n’êtes pas
>> certain que le contenu ne présente aucun risque.
>> >
>> >
>> >
>> >
>> >
>> >
>> > Yep. It's a valid point that we should re-evaluate things now after
>> Airflow
>> > 3 is out - the reason why we delayed it was that we wanted to get more
>> > clarity on what implementation and scope of the Airflow 3 changes will
>> be
>> > and see how it fits.
>> >
>> >
>> > I wonder what others - especially those who run Airflow at scale and
>> > hear the users asking for different forms of multi-team - would say to
>> the
>> > expectations they have and how they map to Airflow 3 - maybe indeed we
>> > might come up with a simpler way of achieving those expectations.
>> >
>> >
>> > Definitely worth discussing it.
>> >
>> >
>> > J.
>> >
>> >
>> >
>> >
>> > On Thu, Jun 12, 2025 at 2:15 PM Ash Berlin-Taylor <a...@apache.org
>> <mailto:a...@apache.org> <mailto:a...@apache.org>> wrote:
>> >
>> >
>> >> Hi everyone,
>> >>
>> >> One thing I’ve been struggling with while reading the other thread
>> about
>> >> multi-team DB changes[0] is what is the end-user problem we are trying
>> to
>> >> address with it.
>> >>
>> >> The main impetus for opening this discussion is that a lot has changed
>> in
>> >> Airflow since this AIP was created in early 2024 and voted on
>> mid-2024, and
>> >> I'm wondering if those changes are big enough to invalidate the design
>> and
>> >> assumptions made at the time.
>> >>
>> >> Reading the DB changes thread I see that that changes are far reaching
>> and
>> >> necessarily have to touch most of the Airflow object models, and this
>> got
>> >> me thinking about what value do we actually get with the change, since
>> as
>> >> stated in the AIP some of the non-goals are[1] (slightly edited here
>> for
>> >> brevity with the “[…]"):
>> >>
>> >>
>> >>> • Sharing broker/backend for celery executors between teams. This MAY
>> be
>> >> covered by future AIPs
>> >>> • Implementation of FAB-based multi-team Auth Manager. […]
>> >>> • Per-team concurrency and prioritization of tasks. […].
>> >>> • Resource allocation per-executor. In the current proposal, executors
>> >> are run as sub-processes of Scheduler and we have very little control
>> over
>> >> their individual resource usage. […]
>> >>> • Turn-key multi-team Deployment of Airflow (for example via Helm
>> >> chart). This is unlikely to happen.[…]
>> >>> • team management tools (creation, removal, rename etc.). […]
>> >>> • Combining "global" execution with "team" execution. While it should
>> be
>> >> possible in the proposed architecture to have a "team" execution and
>> >> "global" execution in a single instance of Airflow, this has it's own
>> >> unique set of challenges and assumption is that Airflow Deployment is
>> >> either "global" (today) or "multi-team" (After this AIP is
>> implemented) -
>> >> but it cannot be combined (yet). This is possible to be implemented in
>> the
>> >> future.
>> >>> • Running multiple schedulers - one-per team. While it should be
>> >> possible if we add support to select DAGs "per team" per scheduler,
>> this is
>> >> not implemented in this AIP and left for the future
>> >>
>> >> And also Design Non-goals from the AIP [2]:
>> >>
>> >>> • It’s not a primary goal of this proposal to significantly decrease
>> >> resource consumption for Airflow installation compared to the current
>> ways
>> >> of achieving “multi-tenant” setup. […]
>> >>> • It’s not a goal of the proposal to provide a one-stop installation
>> >> mechanism for “Multi-team” Airflow. […]
>> >>> • It’s not a goal to decrease the overall maintenance effort involved
>> in
>> >> responding to needs of different teams, […]
>> >>
>> >> The main pain point that we seem to be addressing with this AIP is
>> this[3]:
>> >>
>> >>> The main reason for having multi-team deployment of Airflow is
>> achieving
>> >> security and isolation between the teams, coupled with ability of the
>> >> isolated teams to collaborate via shared Datasets.
>> >>
>> >>
>> >> So what’s changed since we collectively (myself included) voted on and
>> >> accepted this AIP? Well, we now have AIP-82 — External event driven
>> dags.
>> >> That could be used to achieve this goal right now in 3.0 with no
>> changes to
>> >> Airflow itself, and is perhaps a more robust mechanism of doing it too.
>> >>
>> >> So my main question, given the wide reaching code changes need for
>> AIP-67,
>> >> and (IMO) the imperfect/limited scope of team completion I wonder if
>> using
>> >> AIP-82 would not be a better solution to the problem.
>> >>
>> >> 1. It’s much simpler from a code level, as nothing need to change
>> >> 2. It’s not _that_ much more complex from an operational point of view
>> >> (you have to run an extra scheduler and web server, but those would
>> likely
>> >> need scaling up.)
>> >> 3. We won’t disappoint people by not implementing the part of
>> multi-team
>> >> that they want (Someone being part of multiple teams, sharing
>> >> connections/vars between teams)
>> >>
>> >> And using this mechanism (of external dataset/asset polling) also
>> negates
>> >> one of the biggest cons of the AIP-67, that of the tight coupling of
>> >> Airflow versions between the teams. In larger companies this is a
>> _huge_
>> >> problem already, and this would only make it worse.
>> >>
>> >> So what’s my idea (and at this stage is it only an idea for
>> discussion) is
>> >> that we re-evalute AIP-67 in light of what exists in Airflow 3.0 now
>> and
>> >> decide if it’s still worth the added complexity of DB, code and
>> operational
>> >> overhead, and decided if we still want it.
>> >>
>> >> Please, please, please point out if there are other benefits that I
>> have
>> >> missed, I'm not trying to be selective and get my way, I'm trying to
>> make
>> >> sure Airflow continues to meet the need of users, and can also
>> continue to
>> >> evolve (where I worry that complexity of code/datamodel materially
>> hurts
>> >> that final point)
>> >>
>> >> Thoughts?
>> >>
>> >> [0]: https://lists.apache.org/thread/78vndnybgpp705j6sm77l1t6xbrtnt5c
>> <https://lists.apache.org/thread/78vndnybgpp705j6sm77l1t6xbrtnt5c>
>> >> [1]:
>> >>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=294816378#AIP67MultiteamdeploymentofAirflowcomponents-Whatisexcludedfromthescope
>> <
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=294816378#AIP67MultiteamdeploymentofAirflowcomponents-Whatisexcludedfromthescope
>> >
>> >> ?
>> >> [2]:
>> >>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=294816378#AIP67MultiteamdeploymentofAirflowcomponents-DesignNonGoals
>> <
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=294816378#AIP67MultiteamdeploymentofAirflowcomponents-DesignNonGoals
>> >
>> >> [3]:
>> >>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=294816378#AIP67MultiteamdeploymentofAirflowcomponents-Whyisitneeded
>> <
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=294816378#AIP67MultiteamdeploymentofAirflowcomponents-Whyisitneeded
>> >
>> >> ?
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org <mailto:
>> dev-unsubscr...@airflow.apache.org> <mailto:
>> dev-unsubscr...@airflow.apache.org>
>> >> For additional commands, e-mail: dev-h...@airflow.apache.org <mailto:
>> dev-h...@airflow.apache.org> <mailto:dev-h...@airflow.apache.org>
>> >>
>> >>
>> >
>> >
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org <mailto:
>> dev-unsubscr...@airflow.apache.org>
>> > For additional commands, e-mail: dev-h...@airflow.apache.org <mailto:
>> dev-h...@airflow.apache.org>
>>
>

Reply via email to