Re: Discuss: AIP-67 (multi team) now that AIP-82 (External event driven dags) exists

Jarek Potiuk Thu, 19 Jun 2025 01:08:32 -0700

Also one small comment. Yes MT is NEEDED. My talk from Monday at Berlin
Buzzwords has just been published:
https://www.youtube.com/watch?v=EyhZOnbwc-4&list=PLq-odUc2x7i8dTff006Wg2r0fsseSGrpJ&index=50
-> and if you watch it (I quite recommend it :) ), I just briefly mentioned
that we are looking at a Multi-Team. And the ONLY question I get was "When
is MT going to be ready? We need it!" .... I answered - with the disclaimer
that it might not be what they expect - but it's quite clear that the
simpler solution we get and the faster we get it at the hands of the users,
the better it will be - because if we don't, we will never find out what
they really need.


On Thu, Jun 19, 2025 at 9:58 AM Jarek Potiuk <ja...@potiuk.com> wrote:

> Thanks Jens,  Niko,
>
> *Jens*:
>
> > took a long time digesting all the discussion thread. I think it would
> be good to rewrite details to a new AIP so that it can be compared with
> the old AIP.
>
> Definitely - I am planning to update the AIP and re-cast a vote if we will
> be broadly in support of this simplified version (and soon - see below).
>
> > I think this also could include the extension (or is this planned
> otherwise?) to link multiple Airflow instances via Pub/Sub such that
> dataset vents can be externalized... in some cases might be easier to
> host multiple instances.
>
> This is already really implemented with common.messaging and AIP-82 ->
> this is also what triggered Ash original "Is it still needed after AIP-82"
> ?
>
> > The different Envs as described below are looking good. Besides "Env"
> this could also map to dedicated "executor" profiles e.g. instantiating
> a dedicated executor per team? I think in regards of multiple executors
> we always intended allowing to instantiate the same executor multiple
> times. Then the Env could be mapped to an executor each.
>
> I would like to simplify things first and come up with a design that will
> require minimal changes. What Ash wrote triggered me indeed "is it worth to
> complicate Airflow that much to implement just MT even if we know that
> different people have different expectations and we do not know if we
> respond to them" - so I completely changed my assumptions. Rather than
> thinking far ahead, I thought (with this new design) - what is the minimal
> set of changes that will get "some" multi-team variant that we can give to
> our users quickly and get feedback. So maybe yes - in the future - we might
> want separate executors, but I propose - let's start as simply as possible
> - ideally by 3.1, 3.2 latest.
>
> > Yeah and in this regard the triggerer would also need a feature
> increment to be hosted with multiple instances. It might be worth
> considering this is needed per executor anyway. I assume this is lagging
> a bit, also for Edge Executor there is no async coverage.... so my
> background with the assumption that different providers and tools might
> be needed in triggerer would elan rather top Option 3) as sketched below.
>
> In the AIP I will propose (again) a minimal set of changes to support the
> single "executor set" case. I propose We can revise it later as the next
> AIP.
>
> *Niko:*
>
> > Revisiting the proposal for multi team sounds like a reasonable idea. It
> is the feature request that we receive the most from our users/customers
> and we certainly want to get it right (and in a way that services both
> users and stakeholders as Jarek mentioned). AIP-67 was written with some
> Airflow 3 features in mind (such as Task SDK), so it’s not completely
> outdated but some features like event driving scheduling might be cause for
> reevaluation. But I honestly don't think it's all that out dated personally.
>
> See above - I re-evaluated that under the "simplicity first". Previously
> without the AIPs implemented for Airflow 3 we had to do a lot more, and the
> original design from Airflow 2 is still "leaking" in the current AIP. Maybe
> we will get there eventually with "more" complexity (multiple executor sets
> for example) - but I came to the conclusion that incremental implementation
> with much smaller scope should be good enough to get us from the ground and
> get feedback from the users.
>
> > As for using Dag Bundles as an approximation for teams, I don’t think
> I’m fully sold, but looking forward to hearing/discussing more. Some
> concerns I have with it:
> >  don’t think bundles are a great entity to represent team-ness for all
> Airflow components and features. They are only indirectly related to
> connections and variables for example, or for configuration. Also passing
> bundle ID(s) (presumably?) to the auth manager feels very strange to me and
> seems to be a very leaky abstraction for bundles. How do we handle a
> logical team that is made up of more than one bundle? Or when bundles are
> added/removed from that logical team? Users will be constantly chasing
> their tails to keep their “team” up to date in their auth manager.
>
> I think "bundle" is a great "first" approximation. But .. I hear you. It's
> too simple and not future-proof. So ... modifying my proposal: - we can add
> team_id to the bundle definition. And do immediately many-to-1 mapping of
> bundles to the team. And still pass the "team_id" to the auth manager
> rather than bundle. That should respond to your concerns and yes - removes
> the leaky "bundle" abstraction in auth manager. You are quite right it
> would be cumbersome.
>
> > When users provide configuration for their logical “team” do they
> specify their Dag bundle(s) ID in the config file or env variables that
> they use to set config? What would that look like concretely, especially if
> it’s a compliment of more than one bundle? Also, again, how does this
> remain stable over time if bundles are added and removed for a single
> logical team. Does that invalidate the config for a logical team that is
> composed of one or more bundles? Do we not want something more stable to
> represent a team? Or do we limit teams to just one bundle?
>
> I think all that is responded to by adding "team_id" to the bundle
> definition - and it also allows all kinds of dynamic behaviours - moving
> bundles between teams, adding bundles to team etc - if we do the mapping
> dynamically, this should be fine. For that we will need to make sure that
> both api-server, scheduler  and triggerer have access to the "bundle
> definition" (to perform the mapping) but that should be rather
> straightforward and we can share it via db. We only need to know which
> bundle dag is coming from and then we can easily map it to the team it
> belongs to. That should be quite easy.
>
>
> I would love to conclude that part of the discussion quickly and propose
> AIP modifications if this is something we feel like a good direction. So I
> will keep it open for a few days and if you have any comments/ questions I
> am happy to follow up.
>
> Ash - particularly.
>
> You started that thread which means you care (and I really appreciate your
> perspective). And I would like to avoid the situation that there are still
> "other" concerns that you have not yet formulated. I feel I responded to
> pretty much all your concerns, by simplifying the approach and going
> incrementally and prioritising the simplicity and incremental approach over
> "big picture and complexity". If you have any more concerns - I think it
> would be great to formulate them now rather than after we get updates to
> the AIP and start voting.
>
> J.
>
>
>
> On Tue, Jun 17, 2025 at 11:28 PM Oliveira, Niko
> <oniko...@amazon.com.invalid> wrote:
>
>> Hey folks,
>>
>> I’ve been OOTO for the last week so I’m just catching up on this
>> discussion. Here are some thoughts:
>>
>> Revisiting the proposal for multi team sounds like a reasonable idea. It
>> is the feature request that we receive the most from our users/customers
>> and we certainly want to get it right (and in a way that services both
>> users and stakeholders as Jarek mentioned). AIP-67 was written with some
>> Airflow 3 features in mind (such as Task SDK), so it’s not completely
>> outdated but some features like event driving scheduling might be cause for
>> reevaluation. But I honestly don't think it's all that out dated personally.
>>
>> As for using Dag Bundles as an approximation for teams, I don’t think I’m
>> fully sold, but looking forward to hearing/discussing more. Some concerns I
>> have with it:
>>
>> I don’t think bundles are a great entity to represent team-ness for all
>> Airflow components and features. They are only indirectly related to
>> connections and variables for example, or for configuration. Also passing
>> bundle ID(s) (presumably?) to the auth manager feels very strange to me and
>> seems to be a very leaky abstraction for bundles. How do we handle a
>> logical team that is made up of more than one bundle? Or when bundles are
>> added/removed from that logical team? Users will be constantly chasing
>> their tails to keep their “team” up to date in their auth manager.
>>
>> When users provide configuration for their logical “team” do they specify
>> their Dag bundle(s) ID in the config file or env variables that they use to
>> set config? What would that look like concretely, especially if it’s a
>> compliment of more than one bundle? Also, again, how does this remain
>> stable over time if bundles are added and removed for a single logical
>> team. Does that invalidate the config for a logical team that is composed
>> of one or more bundles? Do we not want something more stable to represent a
>> team? Or do we limit teams to just one bundle?
>>
>> Overall, I think permeating bundles across Airflow as an approximation
>> for team is not going to scale well or be very future proof or meet
>> user/customer expectations. For the area of Dags and Dag execution it’s not
>> too bad, but for the items above (as well as Trigger as discussed below)
>> and as we continue to build on multi-team (make it more robust, add
>> requested features from users, etc) I think we’re going to find that it
>> doesn’t serve the job very well. I think a more stable and concrete
>> representation of teams in Airflow will be a much better platform to build
>> features off of. It may require some large-ish changes, but we weren’t
>> afraid of making such changes for other recent capabilities (and largely
>> those changes went smoothly) in Airflow and I think that was the right
>> call, both then and for this case as well.
>>
>> Cheers,
>> Niko
>>
>>
>> ________________________________
>> From: Jens Scheffler <j_scheff...@gmx.de.INVALID>
>> Sent: Sunday, June 15, 2025 11:13:48 AM
>> To: dev@airflow.apache.org
>> Subject: RE: [EXT] Discuss: AIP-67 (multi team) now that AIP-82 (External
>> event driven dags) exists
>>
>> CAUTION: This email originated from outside of the organization. Do not
>> click links or open attachments unless you can confirm the sender and know
>> the content is safe.
>>
>>
>>
>> AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe.
>> Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez
>> pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que
>> le contenu ne présente aucun risque.
>>
>>
>>
>> Hi all,
>>
>> took a long time digesting all the discussion thread. I think it would
>> be good to rewrite details to a new AIP so that it can be compared with
>> the old AIP.
>>
>> I think this also could include the extension (or is this planned
>> otherwise?) to link multiple Airflow instances via Pub/Sub such that
>> dataset vents can be externalized... in some cases might be easier to
>> host multiple instances.
>>
>> The different Envs as described below are looking good. Besides "Env"
>> this could also map to dedicated "executor" profiles e.g. instantiating
>> a dedicated executor per team? I think in regards of multiple executors
>> we always intended allowing to instantiate the same executor multiple
>> times. Then the Env could be mapped to an executor each.
>>
>> Yeah and in this regard the triggerer would also need a feature
>> increment to be hosted with multiple instances. It might be worth
>> considering this is needed per executor anyway. I assume this is lagging
>> a bit, also for Edge Executor there is no async coverage.... so my
>> background with the assumption that different providers and tools might
>> be needed in triggerer would elan rather top Option 3) as sketched below.
>>
>> Jens
>>
>> On 14.06.25 08:21, Jarek Potiuk wrote:
>> > On Fri, Jun 13, 2025 at 7:09 PM Vincent Beck <vincb...@apache.org>
>> wrote:
>> >
>> >> Thanks, Jarek, for this proposal. Overall, I really like it—it
>> >> significantly simplifies multi-team support and removes the need to
>> deploy
>> >> additional components per team, without compromising on the core needs
>> of
>> >> users (unless I’m missing something).
>> >>
>> > Yep. I think with this "design" iteration, I put "simplicity" and
>> > "maintainability" as the primary goal. Separate configuration per tem
>> goes
>> > out the window, ripple-effect on the DB goes out the window, what's
>> left is
>> > basically the same Airflow we already have with few modifications.
>> >
>> >
>> >>> And if we do it and implement packaging and execution environments
>> (say
>> >> ability of choosing predefined venv to parse and execute DAGs coming
>> from a
>> >> specific bundle_id - the expectation 2) above can be handled well.
>> >>
>> >> Could you elaborate on this part? I’m not entirely clear on how it
>> would
>> >> work in practice. For instance, how would it behave with two teams or
>> >> bundles? Real-world examples would help clarify this, unless it's more
>> >> implementation details that we can flesh out once there's agreement on
>> the
>> >> general approach.
>> >>
>> > Currently with Bundle definition we **just** define where the DAGs are
>> > coming from. But we could (and that was even part of the original
>> design)
>> > add extra "execution environment" configuration. For example when we
>> have
>> > bundle_a and bundle_b each of them could have separate "environment"
>> > specified (say env_a, env_b) and we could map such environment to
>> specific
>> > image (image_a, image_b) or virtualenv in the same image (/venv/a/ ,
>> > /venv/b) that would be predefined in the processor/worker images. (or in
>> > VMs if images are not used). The envs might have different sets of
>> > dependencies (providers and others) installed, and both DAG processor
>> > parsing and "Worker" (in celery or k8s Pod) would be run using that
>> > environment. Initially AIP-67 also discussed defining dependencies in
>> > bundle and installing it dynamically (like Python Venv Operator) - but I
>> > think personally having predefined set of environments rather than
>> > dynamically creating (like ExternalPythonOperator) has much better
>> > maintainability, stability and security properties.
>> >
>> >
>> >> Also, what about the triggerer? Since the triggerer runs user code, the
>> >> original AIP-67 proposal required at least one triggerer per team. How
>> >> would that be handled under this new architecture?
>> >>
>> > That is an excellent question :) .  There are few options - depending on
>> > how much of the point 4) "isolating workload" we want to implement.
>> > Paradoxically - to be honest-  for me, the Triggerer always had the
>> > potential of being less of a problem when it comes to isolation. Yes.
>> All
>> > triggers are (currently) running not only in the same interpreter, but
>> also
>> > in the same event loop (which means that isolation goes out of the
>> window),
>> > but also it's relatively easy to introduce the isolation and we've been
>> > discussing options about it in the past as well. I see quite a few.
>> >
>> > Option 1) simplest operationally - We could add a mode in the Airflow
>> that
>> > would resemble Timetables. All Triggers would have to be exposed via the
>> > plugin interface (we could easily expose all triggers this way from all
>> our
>> > providers in a bulk way). This means that deployment manager will have
>> > control on what is run in Triggerrer - effectively limiting it
>> similarly as
>> > Scheduler code today. That would prevent some of the other cases we
>> > discussed recently (such as sending "notification" serialized methods to
>> > triggerer to execute), but that's mostly optimization, and they could be
>> > sent as worker tasks instead in this case).
>> >
>> > Option 2) Semi-isolation - for a number of our users just separating
>> > processes might be "enough" (especially if we add cgroups to isolate
>> > the processes - we had that in the past). Not "perfect" and does not
>> have
>> > all security properties, but for a number of our users it might be "good
>> > enough" because they will trust their teams enough to not worry about
>> > potential "malicious actions". In this case a single Triggerrer could
>> run
>> > several event loops - one per bundle, each of them in a separate,
>> isolated
>> > - process and the only change that we would have to do is to route the
>> > triggers to the right loop based on bundle id. Almost no operational
>> > complexity increases, but isolation is greatly improved. Again following
>> > the bundle -> environment mapping each of those processes could be run
>> > using a specific "per-bundle" environment where all necessary
>> dependencies
>> > would be installed. And here the limit of arbitrary code execution
>> coming
>> > from DAG can be lifted.
>> >
>> > Option 3) Full isolation -> simply run one triggerer per bundle. That
>> is a
>> > bit more like the original proposal, because we will then have an extra
>> > triggerer for each bundle/team (or group of bundles - it does not have
>> to
>> > be 1-to-1 mapping, could be many-to-1) . But it should provide full
>> > "security" properties with isolation and separation of workload, each
>> > triggerer could run completely in the same environment as defined in the
>> > bundle. It increases operational complexity - but just a bit. Rainbows
>> and
>> > unicorns - we have it all.
>> >
>> > Also one more thing.
>> >
>> > We usually discuss technical aspects here in develist and rarely talk
>> > about "business". But I think this is in some cases wrong - including a
>> > multi-team that has the potential of either supporting or undermining
>> some
>> > of the business our stakeholders do with Airflow.
>> >
>> > I would like to - really - make a collaborative effort to come up with a
>> > multi-team approach with all the stakeholders here - Amazon, Google,
>> > Astronomer especially should all be on-board with it. We know our users
>> > need it (survey and a number of talks about multi-team/tenancy that were
>> > submitted this year for Summit speak for themselves - we had ~10
>> sessions
>> > submitted about it this year, 30% of survey respondents want it -
>> though of
>> > course as Ash correctly pointed out many of those people have different
>> > expectations). Again multi-team has the potential of either killing or
>> > supporting some of the business models our stakeholders might implement
>> in
>> > their offering. And while here we do not "care" too much about those
>> > models, we should care about our stakeholders sustainability  - as they
>> are
>> > the ones who are fueling Airflow in many ways - so it would be stupid
>> if we
>> > do not consider their expectations and needs and - yes - sustainability
>> of
>> > their businesses. Here in the community we mostly add features that can
>> be
>> > used by everyone - whether in "as a service" or "on-prem" environment.
>> And
>> > we cannot "know" what the business is being planned or is possible or is
>> > good for our stakeholders. But we can collaboratively design the feature
>> > that might be usable on-prem - but one that we know is good for everyone
>> > and they can continue making business (or even better - provide better
>> > offerings to their users building on top of it.
>> >
>> > Let's do it. If there are things we can improve/make better here, I
>> want to
>> > hear - from everyone - Ash, Vikram, Raj, Vincent, Rafał, Michał - if
>> there
>> > is any idea how to improve it and make it better also for you - I think
>> > it's a good time to discuss it.
>> >
>> > J.
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>> For additional commands, e-mail: dev-h...@airflow.apache.org
>>
>>

Re: Discuss: AIP-67 (multi team) now that AIP-82 (External event driven dags) exists

Reply via email to