Also one small comment. Yes MT is NEEDED. My talk from Monday at Berlin Buzzwords has just been published: https://www.youtube.com/watch?v=EyhZOnbwc-4&list=PLq-odUc2x7i8dTff006Wg2r0fsseSGrpJ&index=50 -> and if you watch it (I quite recommend it :) ), I just briefly mentioned that we are looking at a Multi-Team. And the ONLY question I get was "When is MT going to be ready? We need it!" .... I answered - with the disclaimer that it might not be what they expect - but it's quite clear that the simpler solution we get and the faster we get it at the hands of the users, the better it will be - because if we don't, we will never find out what they really need.
On Thu, Jun 19, 2025 at 9:58 AM Jarek Potiuk <ja...@potiuk.com> wrote: > Thanks Jens, Niko, > > *Jens*: > > > took a long time digesting all the discussion thread. I think it would > be good to rewrite details to a new AIP so that it can be compared with > the old AIP. > > Definitely - I am planning to update the AIP and re-cast a vote if we will > be broadly in support of this simplified version (and soon - see below). > > > I think this also could include the extension (or is this planned > otherwise?) to link multiple Airflow instances via Pub/Sub such that > dataset vents can be externalized... in some cases might be easier to > host multiple instances. > > This is already really implemented with common.messaging and AIP-82 -> > this is also what triggered Ash original "Is it still needed after AIP-82" > ? > > > The different Envs as described below are looking good. Besides "Env" > this could also map to dedicated "executor" profiles e.g. instantiating > a dedicated executor per team? I think in regards of multiple executors > we always intended allowing to instantiate the same executor multiple > times. Then the Env could be mapped to an executor each. > > I would like to simplify things first and come up with a design that will > require minimal changes. What Ash wrote triggered me indeed "is it worth to > complicate Airflow that much to implement just MT even if we know that > different people have different expectations and we do not know if we > respond to them" - so I completely changed my assumptions. Rather than > thinking far ahead, I thought (with this new design) - what is the minimal > set of changes that will get "some" multi-team variant that we can give to > our users quickly and get feedback. So maybe yes - in the future - we might > want separate executors, but I propose - let's start as simply as possible > - ideally by 3.1, 3.2 latest. > > > Yeah and in this regard the triggerer would also need a feature > increment to be hosted with multiple instances. It might be worth > considering this is needed per executor anyway. I assume this is lagging > a bit, also for Edge Executor there is no async coverage.... so my > background with the assumption that different providers and tools might > be needed in triggerer would elan rather top Option 3) as sketched below. > > In the AIP I will propose (again) a minimal set of changes to support the > single "executor set" case. I propose We can revise it later as the next > AIP. > > *Niko:* > > > Revisiting the proposal for multi team sounds like a reasonable idea. It > is the feature request that we receive the most from our users/customers > and we certainly want to get it right (and in a way that services both > users and stakeholders as Jarek mentioned). AIP-67 was written with some > Airflow 3 features in mind (such as Task SDK), so it’s not completely > outdated but some features like event driving scheduling might be cause for > reevaluation. But I honestly don't think it's all that out dated personally. > > See above - I re-evaluated that under the "simplicity first". Previously > without the AIPs implemented for Airflow 3 we had to do a lot more, and the > original design from Airflow 2 is still "leaking" in the current AIP. Maybe > we will get there eventually with "more" complexity (multiple executor sets > for example) - but I came to the conclusion that incremental implementation > with much smaller scope should be good enough to get us from the ground and > get feedback from the users. > > > As for using Dag Bundles as an approximation for teams, I don’t think > I’m fully sold, but looking forward to hearing/discussing more. Some > concerns I have with it: > > don’t think bundles are a great entity to represent team-ness for all > Airflow components and features. They are only indirectly related to > connections and variables for example, or for configuration. Also passing > bundle ID(s) (presumably?) to the auth manager feels very strange to me and > seems to be a very leaky abstraction for bundles. How do we handle a > logical team that is made up of more than one bundle? Or when bundles are > added/removed from that logical team? Users will be constantly chasing > their tails to keep their “team” up to date in their auth manager. > > I think "bundle" is a great "first" approximation. But .. I hear you. It's > too simple and not future-proof. So ... modifying my proposal: - we can add > team_id to the bundle definition. And do immediately many-to-1 mapping of > bundles to the team. And still pass the "team_id" to the auth manager > rather than bundle. That should respond to your concerns and yes - removes > the leaky "bundle" abstraction in auth manager. You are quite right it > would be cumbersome. > > > When users provide configuration for their logical “team” do they > specify their Dag bundle(s) ID in the config file or env variables that > they use to set config? What would that look like concretely, especially if > it’s a compliment of more than one bundle? Also, again, how does this > remain stable over time if bundles are added and removed for a single > logical team. Does that invalidate the config for a logical team that is > composed of one or more bundles? Do we not want something more stable to > represent a team? Or do we limit teams to just one bundle? > > I think all that is responded to by adding "team_id" to the bundle > definition - and it also allows all kinds of dynamic behaviours - moving > bundles between teams, adding bundles to team etc - if we do the mapping > dynamically, this should be fine. For that we will need to make sure that > both api-server, scheduler and triggerer have access to the "bundle > definition" (to perform the mapping) but that should be rather > straightforward and we can share it via db. We only need to know which > bundle dag is coming from and then we can easily map it to the team it > belongs to. That should be quite easy. > > > I would love to conclude that part of the discussion quickly and propose > AIP modifications if this is something we feel like a good direction. So I > will keep it open for a few days and if you have any comments/ questions I > am happy to follow up. > > Ash - particularly. > > You started that thread which means you care (and I really appreciate your > perspective). And I would like to avoid the situation that there are still > "other" concerns that you have not yet formulated. I feel I responded to > pretty much all your concerns, by simplifying the approach and going > incrementally and prioritising the simplicity and incremental approach over > "big picture and complexity". If you have any more concerns - I think it > would be great to formulate them now rather than after we get updates to > the AIP and start voting. > > J. > > > > On Tue, Jun 17, 2025 at 11:28 PM Oliveira, Niko > <oniko...@amazon.com.invalid> wrote: > >> Hey folks, >> >> I’ve been OOTO for the last week so I’m just catching up on this >> discussion. Here are some thoughts: >> >> Revisiting the proposal for multi team sounds like a reasonable idea. It >> is the feature request that we receive the most from our users/customers >> and we certainly want to get it right (and in a way that services both >> users and stakeholders as Jarek mentioned). AIP-67 was written with some >> Airflow 3 features in mind (such as Task SDK), so it’s not completely >> outdated but some features like event driving scheduling might be cause for >> reevaluation. But I honestly don't think it's all that out dated personally. >> >> As for using Dag Bundles as an approximation for teams, I don’t think I’m >> fully sold, but looking forward to hearing/discussing more. Some concerns I >> have with it: >> >> I don’t think bundles are a great entity to represent team-ness for all >> Airflow components and features. They are only indirectly related to >> connections and variables for example, or for configuration. Also passing >> bundle ID(s) (presumably?) to the auth manager feels very strange to me and >> seems to be a very leaky abstraction for bundles. How do we handle a >> logical team that is made up of more than one bundle? Or when bundles are >> added/removed from that logical team? Users will be constantly chasing >> their tails to keep their “team” up to date in their auth manager. >> >> When users provide configuration for their logical “team” do they specify >> their Dag bundle(s) ID in the config file or env variables that they use to >> set config? What would that look like concretely, especially if it’s a >> compliment of more than one bundle? Also, again, how does this remain >> stable over time if bundles are added and removed for a single logical >> team. Does that invalidate the config for a logical team that is composed >> of one or more bundles? Do we not want something more stable to represent a >> team? Or do we limit teams to just one bundle? >> >> Overall, I think permeating bundles across Airflow as an approximation >> for team is not going to scale well or be very future proof or meet >> user/customer expectations. For the area of Dags and Dag execution it’s not >> too bad, but for the items above (as well as Trigger as discussed below) >> and as we continue to build on multi-team (make it more robust, add >> requested features from users, etc) I think we’re going to find that it >> doesn’t serve the job very well. I think a more stable and concrete >> representation of teams in Airflow will be a much better platform to build >> features off of. It may require some large-ish changes, but we weren’t >> afraid of making such changes for other recent capabilities (and largely >> those changes went smoothly) in Airflow and I think that was the right >> call, both then and for this case as well. >> >> Cheers, >> Niko >> >> >> ________________________________ >> From: Jens Scheffler <j_scheff...@gmx.de.INVALID> >> Sent: Sunday, June 15, 2025 11:13:48 AM >> To: dev@airflow.apache.org >> Subject: RE: [EXT] Discuss: AIP-67 (multi team) now that AIP-82 (External >> event driven dags) exists >> >> CAUTION: This email originated from outside of the organization. Do not >> click links or open attachments unless you can confirm the sender and know >> the content is safe. >> >> >> >> AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe. >> Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez >> pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que >> le contenu ne présente aucun risque. >> >> >> >> Hi all, >> >> took a long time digesting all the discussion thread. I think it would >> be good to rewrite details to a new AIP so that it can be compared with >> the old AIP. >> >> I think this also could include the extension (or is this planned >> otherwise?) to link multiple Airflow instances via Pub/Sub such that >> dataset vents can be externalized... in some cases might be easier to >> host multiple instances. >> >> The different Envs as described below are looking good. Besides "Env" >> this could also map to dedicated "executor" profiles e.g. instantiating >> a dedicated executor per team? I think in regards of multiple executors >> we always intended allowing to instantiate the same executor multiple >> times. Then the Env could be mapped to an executor each. >> >> Yeah and in this regard the triggerer would also need a feature >> increment to be hosted with multiple instances. It might be worth >> considering this is needed per executor anyway. I assume this is lagging >> a bit, also for Edge Executor there is no async coverage.... so my >> background with the assumption that different providers and tools might >> be needed in triggerer would elan rather top Option 3) as sketched below. >> >> Jens >> >> On 14.06.25 08:21, Jarek Potiuk wrote: >> > On Fri, Jun 13, 2025 at 7:09 PM Vincent Beck <vincb...@apache.org> >> wrote: >> > >> >> Thanks, Jarek, for this proposal. Overall, I really like it—it >> >> significantly simplifies multi-team support and removes the need to >> deploy >> >> additional components per team, without compromising on the core needs >> of >> >> users (unless I’m missing something). >> >> >> > Yep. I think with this "design" iteration, I put "simplicity" and >> > "maintainability" as the primary goal. Separate configuration per tem >> goes >> > out the window, ripple-effect on the DB goes out the window, what's >> left is >> > basically the same Airflow we already have with few modifications. >> > >> > >> >>> And if we do it and implement packaging and execution environments >> (say >> >> ability of choosing predefined venv to parse and execute DAGs coming >> from a >> >> specific bundle_id - the expectation 2) above can be handled well. >> >> >> >> Could you elaborate on this part? I’m not entirely clear on how it >> would >> >> work in practice. For instance, how would it behave with two teams or >> >> bundles? Real-world examples would help clarify this, unless it's more >> >> implementation details that we can flesh out once there's agreement on >> the >> >> general approach. >> >> >> > Currently with Bundle definition we **just** define where the DAGs are >> > coming from. But we could (and that was even part of the original >> design) >> > add extra "execution environment" configuration. For example when we >> have >> > bundle_a and bundle_b each of them could have separate "environment" >> > specified (say env_a, env_b) and we could map such environment to >> specific >> > image (image_a, image_b) or virtualenv in the same image (/venv/a/ , >> > /venv/b) that would be predefined in the processor/worker images. (or in >> > VMs if images are not used). The envs might have different sets of >> > dependencies (providers and others) installed, and both DAG processor >> > parsing and "Worker" (in celery or k8s Pod) would be run using that >> > environment. Initially AIP-67 also discussed defining dependencies in >> > bundle and installing it dynamically (like Python Venv Operator) - but I >> > think personally having predefined set of environments rather than >> > dynamically creating (like ExternalPythonOperator) has much better >> > maintainability, stability and security properties. >> > >> > >> >> Also, what about the triggerer? Since the triggerer runs user code, the >> >> original AIP-67 proposal required at least one triggerer per team. How >> >> would that be handled under this new architecture? >> >> >> > That is an excellent question :) . There are few options - depending on >> > how much of the point 4) "isolating workload" we want to implement. >> > Paradoxically - to be honest- for me, the Triggerer always had the >> > potential of being less of a problem when it comes to isolation. Yes. >> All >> > triggers are (currently) running not only in the same interpreter, but >> also >> > in the same event loop (which means that isolation goes out of the >> window), >> > but also it's relatively easy to introduce the isolation and we've been >> > discussing options about it in the past as well. I see quite a few. >> > >> > Option 1) simplest operationally - We could add a mode in the Airflow >> that >> > would resemble Timetables. All Triggers would have to be exposed via the >> > plugin interface (we could easily expose all triggers this way from all >> our >> > providers in a bulk way). This means that deployment manager will have >> > control on what is run in Triggerrer - effectively limiting it >> similarly as >> > Scheduler code today. That would prevent some of the other cases we >> > discussed recently (such as sending "notification" serialized methods to >> > triggerer to execute), but that's mostly optimization, and they could be >> > sent as worker tasks instead in this case). >> > >> > Option 2) Semi-isolation - for a number of our users just separating >> > processes might be "enough" (especially if we add cgroups to isolate >> > the processes - we had that in the past). Not "perfect" and does not >> have >> > all security properties, but for a number of our users it might be "good >> > enough" because they will trust their teams enough to not worry about >> > potential "malicious actions". In this case a single Triggerrer could >> run >> > several event loops - one per bundle, each of them in a separate, >> isolated >> > - process and the only change that we would have to do is to route the >> > triggers to the right loop based on bundle id. Almost no operational >> > complexity increases, but isolation is greatly improved. Again following >> > the bundle -> environment mapping each of those processes could be run >> > using a specific "per-bundle" environment where all necessary >> dependencies >> > would be installed. And here the limit of arbitrary code execution >> coming >> > from DAG can be lifted. >> > >> > Option 3) Full isolation -> simply run one triggerer per bundle. That >> is a >> > bit more like the original proposal, because we will then have an extra >> > triggerer for each bundle/team (or group of bundles - it does not have >> to >> > be 1-to-1 mapping, could be many-to-1) . But it should provide full >> > "security" properties with isolation and separation of workload, each >> > triggerer could run completely in the same environment as defined in the >> > bundle. It increases operational complexity - but just a bit. Rainbows >> and >> > unicorns - we have it all. >> > >> > Also one more thing. >> > >> > We usually discuss technical aspects here in develist and rarely talk >> > about "business". But I think this is in some cases wrong - including a >> > multi-team that has the potential of either supporting or undermining >> some >> > of the business our stakeholders do with Airflow. >> > >> > I would like to - really - make a collaborative effort to come up with a >> > multi-team approach with all the stakeholders here - Amazon, Google, >> > Astronomer especially should all be on-board with it. We know our users >> > need it (survey and a number of talks about multi-team/tenancy that were >> > submitted this year for Summit speak for themselves - we had ~10 >> sessions >> > submitted about it this year, 30% of survey respondents want it - >> though of >> > course as Ash correctly pointed out many of those people have different >> > expectations). Again multi-team has the potential of either killing or >> > supporting some of the business models our stakeholders might implement >> in >> > their offering. And while here we do not "care" too much about those >> > models, we should care about our stakeholders sustainability - as they >> are >> > the ones who are fueling Airflow in many ways - so it would be stupid >> if we >> > do not consider their expectations and needs and - yes - sustainability >> of >> > their businesses. Here in the community we mostly add features that can >> be >> > used by everyone - whether in "as a service" or "on-prem" environment. >> And >> > we cannot "know" what the business is being planned or is possible or is >> > good for our stakeholders. But we can collaboratively design the feature >> > that might be usable on-prem - but one that we know is good for everyone >> > and they can continue making business (or even better - provide better >> > offerings to their users building on top of it. >> > >> > Let's do it. If there are things we can improve/make better here, I >> want to >> > hear - from everyone - Ash, Vikram, Raj, Vincent, Rafał, Michał - if >> there >> > is any idea how to improve it and make it better also for you - I think >> > it's a good time to discuss it. >> > >> > J. >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >> For additional commands, e-mail: dev-h...@airflow.apache.org >> >>