Re: Discuss: AIP-67 (multi team) now that AIP-82 (External event driven dags) exists

Oliveira, Niko Thu, 19 Jun 2025 16:22:33 -0700

> I think "bundle" is a great "first" approximation. But .. I hear you. It's
> too simple and not future-proof. So ... modifying my proposal: - we can add
> team_id to the bundle definition. And do immediately many-to-1 mapping of
> bundles to the team. And still pass the "team_id" to the auth manager
> rather than bundle. That should respond to your concerns and yes - removes
> the leaky "bundle" abstraction in auth manager. You are quite right it
> would be cumbersome.


>> When users provide configuration for their logical “team” do they specify
>> their Dag bundle(s) ID in the config file or env variables that they use to
>> set config? What would that look like concretely, especially if it’s a
>> compliment of more than one bundle? Also, again, how does this remain
>> stable over time if bundles are added and removed for a single logical
>> team. Does that invalidate the config for a logical team that is composed
>> of one or more bundles? Do we not want something more stable to represent a
>> team? Or do we limit teams to just one bundle?

> I think all that is responded to by adding "team_id" to the bundle
> definition - and it also allows all kinds of dynamic behaviours - moving
> bundles between teams, adding bundles to team etc - if we do the mapping
> dynamically, this should be fine. For that we will need to make sure that
> both api-server, scheduler and triggerer have access to the "bundle
> definition" (to perform the mapping) but that should be rather
> straightforward and we can share it via db. We only need to know which
> bundle dag is coming from and then we can easily map it to the team it
> belongs to. That should be quite easy.

I like this iteration a bit more now for sure, thanks for being receptive to 
feedback! :)
This now becomes quite close to what was proposing before, we now again have a 
team ID (which I think is really needed here, glad to see it back) and it will 
be used for auth management, configuration specification, etc but will be 
carried by Bundle instead of the dag model. Which as you say “For that we will 
need to make sure that both api-server, scheduler and triggerer have access to 
the "bundle definition" (to perform the mapping)" which honestly doesn’t feel 
too much different from the original proposal we had last week of adding it to 
Dag table and ensuring it’s available everywhere. but either way I’m happy to 
meet in the middle and keep it on Bundle if everyone else feels that’s a more 
suitable location.

One other thing I’d point out is that I think including executors per team is a 
very easy win and quite possible without much work. I already have much of the 
code written. Executors are already aware of Teams that own them (merged), I 
have a PR open to have configuration per team (with a quite simple and isolated 
approach, I believe you approved Jarek). The last piece is updating the 
scheduling logic to route tasks from a particular Bundle to the correct 
executor, which shouldn’t be much work (though it would be easier if the Task 
models had a column for the team they belong to, rather than having to look up 
the Dag and Bundle to get the team) I have a branch where I was experimenting 
with this logic already.
Any who, long story short, I don’t think we necessarily need to remove this 
piece from the project's scope if it is already partly done and not too 
difficult.

Cheers,
Niko


________________________________
From: Jarek Potiuk <ja...@potiuk.com>
Sent: Thursday, June 19, 2025 1:06:32 AM
To: dev@airflow.apache.org
Subject: RE: [EXT] Discuss: AIP-67 (multi team) now that AIP-82 (External event 
driven dags) exists

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.



AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe. Ne 
cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez pas 
confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que le 
contenu ne présente aucun risque.



Also one small comment. Yes MT is NEEDED. My talk from Monday at Berlin
Buzzwords has just been published:
https://www.youtube.com/watch?v=EyhZOnbwc-4&list=PLq-odUc2x7i8dTff006Wg2r0fsseSGrpJ&index=50
-> and if you watch it (I quite recommend it :) ), I just briefly mentioned
that we are looking at a Multi-Team. And the ONLY question I get was "When
is MT going to be ready? We need it!" .... I answered - with the disclaimer
that it might not be what they expect - but it's quite clear that the
simpler solution we get and the faster we get it at the hands of the users,
the better it will be - because if we don't, we will never find out what
they really need.

On Thu, Jun 19, 2025 at 9:58 AM Jarek Potiuk <ja...@potiuk.com> wrote:

> Thanks Jens,  Niko,
>
> *Jens*:
>
> > took a long time digesting all the discussion thread. I think it would
> be good to rewrite details to a new AIP so that it can be compared with
> the old AIP.
>
> Definitely - I am planning to update the AIP and re-cast a vote if we will
> be broadly in support of this simplified version (and soon - see below).
>
> > I think this also could include the extension (or is this planned
> otherwise?) to link multiple Airflow instances via Pub/Sub such that
> dataset vents can be externalized... in some cases might be easier to
> host multiple instances.
>
> This is already really implemented with common.messaging and AIP-82 ->
> this is also what triggered Ash original "Is it still needed after AIP-82"
> ?
>
> > The different Envs as described below are looking good. Besides "Env"
> this could also map to dedicated "executor" profiles e.g. instantiating
> a dedicated executor per team? I think in regards of multiple executors
> we always intended allowing to instantiate the same executor multiple
> times. Then the Env could be mapped to an executor each.
>
> I would like to simplify things first and come up with a design that will
> require minimal changes. What Ash wrote triggered me indeed "is it worth to
> complicate Airflow that much to implement just MT even if we know that
> different people have different expectations and we do not know if we
> respond to them" - so I completely changed my assumptions. Rather than
> thinking far ahead, I thought (with this new design) - what is the minimal
> set of changes that will get "some" multi-team variant that we can give to
> our users quickly and get feedback. So maybe yes - in the future - we might
> want separate executors, but I propose - let's start as simply as possible
> - ideally by 3.1, 3.2 latest.
>
> > Yeah and in this regard the triggerer would also need a feature
> increment to be hosted with multiple instances. It might be worth
> considering this is needed per executor anyway. I assume this is lagging
> a bit, also for Edge Executor there is no async coverage.... so my
> background with the assumption that different providers and tools might
> be needed in triggerer would elan rather top Option 3) as sketched below.
>
> In the AIP I will propose (again) a minimal set of changes to support the
> single "executor set" case. I propose We can revise it later as the next
> AIP.
>
> *Niko:*
>
> > Revisiting the proposal for multi team sounds like a reasonable idea. It
> is the feature request that we receive the most from our users/customers
> and we certainly want to get it right (and in a way that services both
> users and stakeholders as Jarek mentioned). AIP-67 was written with some
> Airflow 3 features in mind (such as Task SDK), so it’s not completely
> outdated but some features like event driving scheduling might be cause for
> reevaluation. But I honestly don't think it's all that out dated personally.
>
> See above - I re-evaluated that under the "simplicity first". Previously
> without the AIPs implemented for Airflow 3 we had to do a lot more, and the
> original design from Airflow 2 is still "leaking" in the current AIP. Maybe
> we will get there eventually with "more" complexity (multiple executor sets
> for example) - but I came to the conclusion that incremental implementation
> with much smaller scope should be good enough to get us from the ground and
> get feedback from the users.
>
> > As for using Dag Bundles as an approximation for teams, I don’t think
> I’m fully sold, but looking forward to hearing/discussing more. Some
> concerns I have with it:
> >  don’t think bundles are a great entity to represent team-ness for all
> Airflow components and features. They are only indirectly related to
> connections and variables for example, or for configuration. Also passing
> bundle ID(s) (presumably?) to the auth manager feels very strange to me and
> seems to be a very leaky abstraction for bundles. How do we handle a
> logical team that is made up of more than one bundle? Or when bundles are
> added/removed from that logical team? Users will be constantly chasing
> their tails to keep their “team” up to date in their auth manager.
>
> I think "bundle" is a great "first" approximation. But .. I hear you. It's
> too simple and not future-proof. So ... modifying my proposal: - we can add
> team_id to the bundle definition. And do immediately many-to-1 mapping of
> bundles to the team. And still pass the "team_id" to the auth manager
> rather than bundle. That should respond to your concerns and yes - removes
> the leaky "bundle" abstraction in auth manager. You are quite right it
> would be cumbersome.
>
> > When users provide configuration for their logical “team” do they
> specify their Dag bundle(s) ID in the config file or env variables that
> they use to set config? What would that look like concretely, especially if
> it’s a compliment of more than one bundle? Also, again, how does this
> remain stable over time if bundles are added and removed for a single
> logical team. Does that invalidate the config for a logical team that is
> composed of one or more bundles? Do we not want something more stable to
> represent a team? Or do we limit teams to just one bundle?
>
> I think all that is responded to by adding "team_id" to the bundle
> definition - and it also allows all kinds of dynamic behaviours - moving
> bundles between teams, adding bundles to team etc - if we do the mapping
> dynamically, this should be fine. For that we will need to make sure that
> both api-server, scheduler  and triggerer have access to the "bundle
> definition" (to perform the mapping) but that should be rather
> straightforward and we can share it via db. We only need to know which
> bundle dag is coming from and then we can easily map it to the team it
> belongs to. That should be quite easy.
>
>
> I would love to conclude that part of the discussion quickly and propose
> AIP modifications if this is something we feel like a good direction. So I
> will keep it open for a few days and if you have any comments/ questions I
> am happy to follow up.
>
> Ash - particularly.
>
> You started that thread which means you care (and I really appreciate your
> perspective). And I would like to avoid the situation that there are still
> "other" concerns that you have not yet formulated. I feel I responded to
> pretty much all your concerns, by simplifying the approach and going
> incrementally and prioritising the simplicity and incremental approach over
> "big picture and complexity". If you have any more concerns - I think it
> would be great to formulate them now rather than after we get updates to
> the AIP and start voting.
>
> J.
>
>
>
> On Tue, Jun 17, 2025 at 11:28 PM Oliveira, Niko
> <oniko...@amazon.com.invalid> wrote:
>
>> Hey folks,
>>
>> I’ve been OOTO for the last week so I’m just catching up on this
>> discussion. Here are some thoughts:
>>
>> Revisiting the proposal for multi team sounds like a reasonable idea. It
>> is the feature request that we receive the most from our users/customers
>> and we certainly want to get it right (and in a way that services both
>> users and stakeholders as Jarek mentioned). AIP-67 was written with some
>> Airflow 3 features in mind (such as Task SDK), so it’s not completely
>> outdated but some features like event driving scheduling might be cause for
>> reevaluation. But I honestly don't think it's all that out dated personally.
>>
>> As for using Dag Bundles as an approximation for teams, I don’t think I’m
>> fully sold, but looking forward to hearing/discussing more. Some concerns I
>> have with it:
>>
>> I don’t think bundles are a great entity to represent team-ness for all
>> Airflow components and features. They are only indirectly related to
>> connections and variables for example, or for configuration. Also passing
>> bundle ID(s) (presumably?) to the auth manager feels very strange to me and
>> seems to be a very leaky abstraction for bundles. How do we handle a
>> logical team that is made up of more than one bundle? Or when bundles are
>> added/removed from that logical team? Users will be constantly chasing
>> their tails to keep their “team” up to date in their auth manager.
>>
>> When users provide configuration for their logical “team” do they specify
>> their Dag bundle(s) ID in the config file or env variables that they use to
>> set config? What would that look like concretely, especially if it’s a
>> compliment of more than one bundle? Also, again, how does this remain
>> stable over time if bundles are added and removed for a single logical
>> team. Does that invalidate the config for a logical team that is composed
>> of one or more bundles? Do we not want something more stable to represent a
>> team? Or do we limit teams to just one bundle?
>>
>> Overall, I think permeating bundles across Airflow as an approximation
>> for team is not going to scale well or be very future proof or meet
>> user/customer expectations. For the area of Dags and Dag execution it’s not
>> too bad, but for the items above (as well as Trigger as discussed below)
>> and as we continue to build on multi-team (make it more robust, add
>> requested features from users, etc) I think we’re going to find that it
>> doesn’t serve the job very well. I think a more stable and concrete
>> representation of teams in Airflow will be a much better platform to build
>> features off of. It may require some large-ish changes, but we weren’t
>> afraid of making such changes for other recent capabilities (and largely
>> those changes went smoothly) in Airflow and I think that was the right
>> call, both then and for this case as well.
>>
>> Cheers,
>> Niko
>>
>>
>> ________________________________
>> From: Jens Scheffler <j_scheff...@gmx.de.INVALID>
>> Sent: Sunday, June 15, 2025 11:13:48 AM
>> To: dev@airflow.apache.org
>> Subject: RE: [EXT] Discuss: AIP-67 (multi team) now that AIP-82 (External
>> event driven dags) exists
>>
>> CAUTION: This email originated from outside of the organization. Do not
>> click links or open attachments unless you can confirm the sender and know
>> the content is safe.
>>
>>
>>
>> AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe.
>> Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez
>> pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que
>> le contenu ne présente aucun risque.
>>
>>
>>
>> Hi all,
>>
>> took a long time digesting all the discussion thread. I think it would
>> be good to rewrite details to a new AIP so that it can be compared with
>> the old AIP.
>>
>> I think this also could include the extension (or is this planned
>> otherwise?) to link multiple Airflow instances via Pub/Sub such that
>> dataset vents can be externalized... in some cases might be easier to
>> host multiple instances.
>>
>> The different Envs as described below are looking good. Besides "Env"
>> this could also map to dedicated "executor" profiles e.g. instantiating
>> a dedicated executor per team? I think in regards of multiple executors
>> we always intended allowing to instantiate the same executor multiple
>> times. Then the Env could be mapped to an executor each.
>>
>> Yeah and in this regard the triggerer would also need a feature
>> increment to be hosted with multiple instances. It might be worth
>> considering this is needed per executor anyway. I assume this is lagging
>> a bit, also for Edge Executor there is no async coverage.... so my
>> background with the assumption that different providers and tools might
>> be needed in triggerer would elan rather top Option 3) as sketched below.
>>
>> Jens
>>
>> On 14.06.25 08:21, Jarek Potiuk wrote:
>> > On Fri, Jun 13, 2025 at 7:09 PM Vincent Beck <vincb...@apache.org>
>> wrote:
>> >
>> >> Thanks, Jarek, for this proposal. Overall, I really like it—it
>> >> significantly simplifies multi-team support and removes the need to
>> deploy
>> >> additional components per team, without compromising on the core needs
>> of
>> >> users (unless I’m missing something).
>> >>
>> > Yep. I think with this "design" iteration, I put "simplicity" and
>> > "maintainability" as the primary goal. Separate configuration per tem
>> goes
>> > out the window, ripple-effect on the DB goes out the window, what's
>> left is
>> > basically the same Airflow we already have with few modifications.
>> >
>> >
>> >>> And if we do it and implement packaging and execution environments
>> (say
>> >> ability of choosing predefined venv to parse and execute DAGs coming
>> from a
>> >> specific bundle_id - the expectation 2) above can be handled well.
>> >>
>> >> Could you elaborate on this part? I’m not entirely clear on how it
>> would
>> >> work in practice. For instance, how would it behave with two teams or
>> >> bundles? Real-world examples would help clarify this, unless it's more
>> >> implementation details that we can flesh out once there's agreement on
>> the
>> >> general approach.
>> >>
>> > Currently with Bundle definition we **just** define where the DAGs are
>> > coming from. But we could (and that was even part of the original
>> design)
>> > add extra "execution environment" configuration. For example when we
>> have
>> > bundle_a and bundle_b each of them could have separate "environment"
>> > specified (say env_a, env_b) and we could map such environment to
>> specific
>> > image (image_a, image_b) or virtualenv in the same image (/venv/a/ ,
>> > /venv/b) that would be predefined in the processor/worker images. (or in
>> > VMs if images are not used). The envs might have different sets of
>> > dependencies (providers and others) installed, and both DAG processor
>> > parsing and "Worker" (in celery or k8s Pod) would be run using that
>> > environment. Initially AIP-67 also discussed defining dependencies in
>> > bundle and installing it dynamically (like Python Venv Operator) - but I
>> > think personally having predefined set of environments rather than
>> > dynamically creating (like ExternalPythonOperator) has much better
>> > maintainability, stability and security properties.
>> >
>> >
>> >> Also, what about the triggerer? Since the triggerer runs user code, the
>> >> original AIP-67 proposal required at least one triggerer per team. How
>> >> would that be handled under this new architecture?
>> >>
>> > That is an excellent question :) .  There are few options - depending on
>> > how much of the point 4) "isolating workload" we want to implement.
>> > Paradoxically - to be honest-  for me, the Triggerer always had the
>> > potential of being less of a problem when it comes to isolation. Yes.
>> All
>> > triggers are (currently) running not only in the same interpreter, but
>> also
>> > in the same event loop (which means that isolation goes out of the
>> window),
>> > but also it's relatively easy to introduce the isolation and we've been
>> > discussing options about it in the past as well. I see quite a few.
>> >
>> > Option 1) simplest operationally - We could add a mode in the Airflow
>> that
>> > would resemble Timetables. All Triggers would have to be exposed via the
>> > plugin interface (we could easily expose all triggers this way from all
>> our
>> > providers in a bulk way). This means that deployment manager will have
>> > control on what is run in Triggerrer - effectively limiting it
>> similarly as
>> > Scheduler code today. That would prevent some of the other cases we
>> > discussed recently (such as sending "notification" serialized methods to
>> > triggerer to execute), but that's mostly optimization, and they could be
>> > sent as worker tasks instead in this case).
>> >
>> > Option 2) Semi-isolation - for a number of our users just separating
>> > processes might be "enough" (especially if we add cgroups to isolate
>> > the processes - we had that in the past). Not "perfect" and does not
>> have
>> > all security properties, but for a number of our users it might be "good
>> > enough" because they will trust their teams enough to not worry about
>> > potential "malicious actions". In this case a single Triggerrer could
>> run
>> > several event loops - one per bundle, each of them in a separate,
>> isolated
>> > - process and the only change that we would have to do is to route the
>> > triggers to the right loop based on bundle id. Almost no operational
>> > complexity increases, but isolation is greatly improved. Again following
>> > the bundle -> environment mapping each of those processes could be run
>> > using a specific "per-bundle" environment where all necessary
>> dependencies
>> > would be installed. And here the limit of arbitrary code execution
>> coming
>> > from DAG can be lifted.
>> >
>> > Option 3) Full isolation -> simply run one triggerer per bundle. That
>> is a
>> > bit more like the original proposal, because we will then have an extra
>> > triggerer for each bundle/team (or group of bundles - it does not have
>> to
>> > be 1-to-1 mapping, could be many-to-1) . But it should provide full
>> > "security" properties with isolation and separation of workload, each
>> > triggerer could run completely in the same environment as defined in the
>> > bundle. It increases operational complexity - but just a bit. Rainbows
>> and
>> > unicorns - we have it all.
>> >
>> > Also one more thing.
>> >
>> > We usually discuss technical aspects here in develist and rarely talk
>> > about "business". But I think this is in some cases wrong - including a
>> > multi-team that has the potential of either supporting or undermining
>> some
>> > of the business our stakeholders do with Airflow.
>> >
>> > I would like to - really - make a collaborative effort to come up with a
>> > multi-team approach with all the stakeholders here - Amazon, Google,
>> > Astronomer especially should all be on-board with it. We know our users
>> > need it (survey and a number of talks about multi-team/tenancy that were
>> > submitted this year for Summit speak for themselves - we had ~10
>> sessions
>> > submitted about it this year, 30% of survey respondents want it -
>> though of
>> > course as Ash correctly pointed out many of those people have different
>> > expectations). Again multi-team has the potential of either killing or
>> > supporting some of the business models our stakeholders might implement
>> in
>> > their offering. And while here we do not "care" too much about those
>> > models, we should care about our stakeholders sustainability  - as they
>> are
>> > the ones who are fueling Airflow in many ways - so it would be stupid
>> if we
>> > do not consider their expectations and needs and - yes - sustainability
>> of
>> > their businesses. Here in the community we mostly add features that can
>> be
>> > used by everyone - whether in "as a service" or "on-prem" environment.
>> And
>> > we cannot "know" what the business is being planned or is possible or is
>> > good for our stakeholders. But we can collaboratively design the feature
>> > that might be usable on-prem - but one that we know is good for everyone
>> > and they can continue making business (or even better - provide better
>> > offerings to their users building on top of it.
>> >
>> > Let's do it. If there are things we can improve/make better here, I
>> want to
>> > hear - from everyone - Ash, Vikram, Raj, Vincent, Rafał, Michał - if
>> there
>> > is any idea how to improve it and make it better also for you - I think
>> > it's a good time to discuss it.
>> >
>> > J.
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>> For additional commands, e-mail: dev-h...@airflow.apache.org
>>
>>

Re: Discuss: AIP-67 (multi team) now that AIP-82 (External event driven dags) exists

Reply via email to