Re: Discuss: AIP-67 (multi team) now that AIP-82 (External event driven dags) exists

Oliveira, Niko Tue, 17 Jun 2025 14:28:15 -0700

Hey folks,

I’ve been OOTO for the last week so I’m just catching up on this discussion. 
Here are some thoughts:

Revisiting the proposal for multi team sounds like a reasonable idea. It is the 
feature request that we receive the most from our users/customers and we 
certainly want to get it right (and in a way that services both users and 
stakeholders as Jarek mentioned). AIP-67 was written with some Airflow 3 
features in mind (such as Task SDK), so it’s not completely outdated but some 
features like event driving scheduling might be cause for reevaluation. But I 
honestly don't think it's all that out dated personally.

As for using Dag Bundles as an approximation for teams, I don’t think I’m fully 
sold, but looking forward to hearing/discussing more. Some concerns I have with 
it:

I don’t think bundles are a great entity to represent team-ness for all Airflow 
components and features. They are only indirectly related to connections and 
variables for example, or for configuration. Also passing bundle ID(s) 
(presumably?) to the auth manager feels very strange to me and seems to be a 
very leaky abstraction for bundles. How do we handle a logical team that is 
made up of more than one bundle? Or when bundles are added/removed from that 
logical team? Users will be constantly chasing their tails to keep their “team” 
up to date in their auth manager.

When users provide configuration for their logical “team” do they specify their 
Dag bundle(s) ID in the config file or env variables that they use to set 
config? What would that look like concretely, especially if it’s a compliment 
of more than one bundle? Also, again, how does this remain stable over time if 
bundles are added and removed for a single logical team. Does that invalidate 
the config for a logical team that is composed of one or more bundles? Do we 
not want something more stable to represent a team? Or do we limit teams to 
just one bundle?

Overall, I think permeating bundles across Airflow as an approximation for team 
is not going to scale well or be very future proof or meet user/customer 
expectations. For the area of Dags and Dag execution it’s not too bad, but for 
the items above (as well as Trigger as discussed below) and as we continue to 
build on multi-team (make it more robust, add requested features from users, 
etc) I think we’re going to find that it doesn’t serve the job very well. I 
think a more stable and concrete representation of teams in Airflow will be a 
much better platform to build features off of. It may require some large-ish 
changes, but we weren’t afraid of making such changes for other recent 
capabilities (and largely those changes went smoothly) in Airflow and I think 
that was the right call, both then and for this case as well.

Cheers,
Niko

________________________________
From: Jens Scheffler <j_scheff...@gmx.de.INVALID>
Sent: Sunday, June 15, 2025 11:13:48 AM
To: dev@airflow.apache.org
Subject: RE: [EXT] Discuss: AIP-67 (multi team) now that AIP-82 (External event 
driven dags) exists

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.

AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe. Ne 
cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez pas 
confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que le 
contenu ne présente aucun risque.

Hi all,

took a long time digesting all the discussion thread. I think it would
be good to rewrite details to a new AIP so that it can be compared with
the old AIP.

I think this also could include the extension (or is this planned
otherwise?) to link multiple Airflow instances via Pub/Sub such that
dataset vents can be externalized... in some cases might be easier to
host multiple instances.

The different Envs as described below are looking good. Besides "Env"
this could also map to dedicated "executor" profiles e.g. instantiating
a dedicated executor per team? I think in regards of multiple executors
we always intended allowing to instantiate the same executor multiple
times. Then the Env could be mapped to an executor each.

Yeah and in this regard the triggerer would also need a feature
increment to be hosted with multiple instances. It might be worth
considering this is needed per executor anyway. I assume this is lagging
a bit, also for Edge Executor there is no async coverage.... so my
background with the assumption that different providers and tools might
be needed in triggerer would elan rather top Option 3) as sketched below.

Jens

On 14.06.25 08:21, Jarek Potiuk wrote:
> On Fri, Jun 13, 2025 at 7:09 PM Vincent Beck <vincb...@apache.org> wrote:
>
>> Thanks, Jarek, for this proposal. Overall, I really like it—it
>> significantly simplifies multi-team support and removes the need to deploy
>> additional components per team, without compromising on the core needs of
>> users (unless I’m missing something).
>>
> Yep. I think with this "design" iteration, I put "simplicity" and
> "maintainability" as the primary goal. Separate configuration per tem goes
> out the window, ripple-effect on the DB goes out the window, what's left is
> basically the same Airflow we already have with few modifications.
>
>
>>> And if we do it and implement packaging and execution environments (say
>> ability of choosing predefined venv to parse and execute DAGs coming from a
>> specific bundle_id - the expectation 2) above can be handled well.
>>
>> Could you elaborate on this part? I’m not entirely clear on how it would
>> work in practice. For instance, how would it behave with two teams or
>> bundles? Real-world examples would help clarify this, unless it's more
>> implementation details that we can flesh out once there's agreement on the
>> general approach.
>>
> Currently with Bundle definition we **just** define where the DAGs are
> coming from. But we could (and that was even part of the original design)
> add extra "execution environment" configuration. For example when we have
> bundle_a and bundle_b each of them could have separate "environment"
> specified (say env_a, env_b) and we could map such environment to specific
> image (image_a, image_b) or virtualenv in the same image (/venv/a/ ,
> /venv/b) that would be predefined in the processor/worker images. (or in
> VMs if images are not used). The envs might have different sets of
> dependencies (providers and others) installed, and both DAG processor
> parsing and "Worker" (in celery or k8s Pod) would be run using that
> environment. Initially AIP-67 also discussed defining dependencies in
> bundle and installing it dynamically (like Python Venv Operator) - but I
> think personally having predefined set of environments rather than
> dynamically creating (like ExternalPythonOperator) has much better
> maintainability, stability and security properties.
>
>
>> Also, what about the triggerer? Since the triggerer runs user code, the
>> original AIP-67 proposal required at least one triggerer per team. How
>> would that be handled under this new architecture?
>>
> That is an excellent question :) .  There are few options - depending on
> how much of the point 4) "isolating workload" we want to implement.
> Paradoxically - to be honest-  for me, the Triggerer always had the
> potential of being less of a problem when it comes to isolation. Yes. All
> triggers are (currently) running not only in the same interpreter, but also
> in the same event loop (which means that isolation goes out of the window),
> but also it's relatively easy to introduce the isolation and we've been
> discussing options about it in the past as well. I see quite a few.
>
> Option 1) simplest operationally - We could add a mode in the Airflow that
> would resemble Timetables. All Triggers would have to be exposed via the
> plugin interface (we could easily expose all triggers this way from all our
> providers in a bulk way). This means that deployment manager will have
> control on what is run in Triggerrer - effectively limiting it similarly as
> Scheduler code today. That would prevent some of the other cases we
> discussed recently (such as sending "notification" serialized methods to
> triggerer to execute), but that's mostly optimization, and they could be
> sent as worker tasks instead in this case).
>
> Option 2) Semi-isolation - for a number of our users just separating
> processes might be "enough" (especially if we add cgroups to isolate
> the processes - we had that in the past). Not "perfect" and does not have
> all security properties, but for a number of our users it might be "good
> enough" because they will trust their teams enough to not worry about
> potential "malicious actions". In this case a single Triggerrer could run
> several event loops - one per bundle, each of them in a separate, isolated
> - process and the only change that we would have to do is to route the
> triggers to the right loop based on bundle id. Almost no operational
> complexity increases, but isolation is greatly improved. Again following
> the bundle -> environment mapping each of those processes could be run
> using a specific "per-bundle" environment where all necessary dependencies
> would be installed. And here the limit of arbitrary code execution coming
> from DAG can be lifted.
>
> Option 3) Full isolation -> simply run one triggerer per bundle. That is a
> bit more like the original proposal, because we will then have an extra
> triggerer for each bundle/team (or group of bundles - it does not have to
> be 1-to-1 mapping, could be many-to-1) . But it should provide full
> "security" properties with isolation and separation of workload, each
> triggerer could run completely in the same environment as defined in the
> bundle. It increases operational complexity - but just a bit. Rainbows and
> unicorns - we have it all.
>
> Also one more thing.
>
> We usually discuss technical aspects here in develist and rarely talk
> about "business". But I think this is in some cases wrong - including a
> multi-team that has the potential of either supporting or undermining some
> of the business our stakeholders do with Airflow.
>
> I would like to - really - make a collaborative effort to come up with a
> multi-team approach with all the stakeholders here - Amazon, Google,
> Astronomer especially should all be on-board with it. We know our users
> need it (survey and a number of talks about multi-team/tenancy that were
> submitted this year for Summit speak for themselves - we had ~10 sessions
> submitted about it this year, 30% of survey respondents want it - though of
> course as Ash correctly pointed out many of those people have different
> expectations). Again multi-team has the potential of either killing or
> supporting some of the business models our stakeholders might implement in
> their offering. And while here we do not "care" too much about those
> models, we should care about our stakeholders sustainability  - as they are
> the ones who are fueling Airflow in many ways - so it would be stupid if we
> do not consider their expectations and needs and - yes - sustainability of
> their businesses. Here in the community we mostly add features that can be
> used by everyone - whether in "as a service" or "on-prem" environment.  And
> we cannot "know" what the business is being planned or is possible or is
> good for our stakeholders. But we can collaboratively design the feature
> that might be usable on-prem - but one that we know is good for everyone
> and they can continue making business (or even better - provide better
> offerings to their users building on top of it.
>
> Let's do it. If there are things we can improve/make better here, I want to
> hear - from everyone - Ash, Vikram, Raj, Vincent, Rafał, Michał - if there
> is any idea how to improve it and make it better also for you - I think
> it's a good time to discuss it.
>
> J.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Re: Discuss: AIP-67 (multi team) now that AIP-82 (External event driven dags) exists

Reply via email to