I realized that I owe Niko an explanation of configuration changes. Again - following the philosophy above - minimal set of changes to "airflow internals". the "minimum" set of changes that will work. I propose the change below that has **no** changes to the way how the current configuration "shared" feature works - it will change the way executors will retrieve their configuration if they are configured "per-team" - and we can 100% bank on existing multi-executors. I believe that will absolutely minimise the set of changes needed to implement multi-team and we will be able to get it "faster" and with "far lower risk" of impacting airflow code and say - 3.1 or 3.2 delivery.
Existing multi-executor configuration will be extended to include team prefix. The prefix will be separated with ":", entries for different teams will be separated with ";" [core] executor = team1:KubernetesExecutor,my.custom.module.Executor Class;team2:CeleryExecutor The configuration of executors will also be prefixed with the same team: [team1:kubernetes_executor] api_client_retry_configuration = { "total": 3, "backoff_factor": 0.5 } The environment variables keeping configuration will use ___ (three underscores) to replace ":". For example: AIRFLOW__TEAM1___KUBERNETES_EXECUTOR__API_CLIENT_RETRY_CONFIGURATION` J. On Thu, Jul 3, 2025 at 8:47 AM Jarek Potiuk <ja...@potiuk.com> wrote: > > The direction this one is taking is interesting. If you're really just > trying to make the feature barely possible and mostly targeted towards > managed providers to implement the rest, then I suppose this hits the mark. > > Well actually by taking the direction I took, it's not "mostly for managed > providers" - i see it as it is equally, for managed providers and on-prem > users, but also, following the open-source spirit, philosophically, I think > in Airflow, any such change should be done with those things in mind, > because we are at the stage where we are already "established' and by > innovating on top what we have we have sometimes more to lose than to gain > - so I feel with "deployment' features we should be very careful to > distinguish 'enabling things" vs. 'doing things". My focus with this > iteration was to remove all the roadblocks that make it impossible (or > extremely difficult) to implement "real" multi-team and separation without > modifying airflow core. I though "what is the minimal set of features that > will make it "possible" for someone motivated to deploy a single airflow > for multiple teams. > > * minimise maintenance effort increase > * do not "spoil" the "simple case" - we do not want to add features that > make "simple" implementation more complex the current `docker run -it > apache/airflow standalone` - should be simple and straightforward to run > * if there is anything that involves complex deployment, we should not aim > to make a "turn-key" solution that we will have to support - similarly like > we do with our configuration parameters, we have 100s knobs to turn, and as > long as default settings are reasonable and someone "motivated" can > configure and fine-tune - this configuration and fine-tuning should be left > to them - regardless if they are on-prem or managed. And both should be > able to do it. > > I think it's not only smart technically (we support the low-level basic > features and when someone puts them together and makes it more of a > turn-key solution they are responsible for designing and implementing it - > so we have less maintenance effort. But also it's good from a simple > "open-source business model point of view" - i.e. it's a smart product > decision we should make. > > Why airflow is #18 in OSS rank - of course we have a huge community and > people contributing in their free time, completely voluntarily. And we > cherish, support and encourage it. But let's be honest - if not all those > that make business on top of airflow did not invest literally millions of > dollars (in terms of engineering salary, sponsoring Airflow Summit, > supporting people like me (some smart stakeholders at least who understand > the value of it) who can be good "community spirit" - Airflow would have > order of magnitude less activity, reach, Airflow 3 would not be simply > possible. And this is a good thing that we have those stakeholders that are > interested and make money by turning Airflow into a "turn-key" solution. > This is a fantastic, symbiotic relationship. > > So - what my thinking is - we should NOT make things that make airflow > more turn-key for those complex cases. We should leave it up to those who > want to make it and want to charge money for it. This is cool and great > that they can do - and we should not do it "for them" - but on the other > hand - we should make it possible that those who want to turn airflow into > more complex (say multi-team solution) to make it happen - by providing > them with minimal set of features that make it possible. > > And that also - in a way - keeps the balance between on-prem and managed > implementation. > > Something that I've learned as a rule of thumb is that making a feature > "generic" compared to custom implementation is 3x-10x more expensive (both > in implementation and maintenance). And it means that if an on-prem user > wants to implement something for them (say turn-key multi-team solution for > their case) it will cost `x` , but when a managed provider wants to > implement a generic multi-team it will cost `10x`. But also managed > providers can spread the cost over the premium they will charge to their > users so that they don't have to manage Airflow on their own and pay `x` > for this mult-team feature to develop on their own. And this is a "fair" > choice to make by on-prem users. They might choose what they want to do > then. Also it's fair for managed provider - yes they need to invest more, > but also they have a chance to shine on promoting it and making it more > optimised at scale etc. etc. > > That is my line of thinking. > > > J. > > > On Thu, Jul 3, 2025 at 1:41 AM Oliveira, Niko <oniko...@amazon.com.invalid> > wrote: > >> Hey Jarek, >> >> >> The direction this one is taking is interesting. If you're really just >> trying to make the feature barely possible and mostly targeted towards >> managed providers to implement the rest, then I suppose this hits the mark. >> >> But this is not something we're asking for at Amazon and personally I >> think we should make the feature reasonably usable for those running >> self-managed OSS Airflow as well. There are many users running an on-prem >> Airflow. Getting too hyper-fixated on an implementation that's so >> simplified that it's obtuse and difficult to use by most users seems like >> the wrong approach to me. But you and I have already discussed this at >> length and I haven't convinced you so far, so if I'm the only one with this >> thinking then I'm happy to disagree and commit as we say at Amazon :) >> >> >> > So I would be rather strong on **not** touching the current >> configuration and >> >> simply adding configuration for per-team executors in executor config - >> even if it is uglier and more "low-level". >> >> Can you explain what "adding configuration for per-team executors in >> executor config" would look like? I don't have a concrete sense of what you >> mean by this. >> >> Thanks for your efforts on trying to get this feature agreed to and voted >> on. Looking forward to working on the project in the coming weeks! >> >> Cheers, >> Niko >> >> ________________________________ >> From: Jarek Potiuk <ja...@potiuk.com> >> Sent: Tuesday, July 1, 2025 10:26:55 PM >> To: dev@airflow.apache.org >> Subject: RE: [EXT] Discuss: AIP-67 (multi team) now that AIP-82 (External >> event driven dags) exists >> >> CAUTION: This email originated from outside of the organization. Do not >> click links or open attachments unless you can confirm the sender and know >> the content is safe. >> >> >> >> AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe. >> Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez >> pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que >> le contenu ne présente aucun risque. >> >> >> >> Any last comments ? There is a long weekend coming up in the US, so I will >> likely start voting on the updated AIP on Monday 7th. >> >> On Fri, Jun 27, 2025 at 12:41 PM Jarek Potiuk <ja...@potiuk.com> wrote: >> >> > I'd really love to finalise discussion and put it up to a vote some time >> > after the recording from the last dev call is posted - so that more >> > context, details and the LONG discussion we had on it. There is no >> *huge* >> > hurry - we have strong dependency on Task Isolation and it seems that >> it >> > will still take a bit of time to complete, so I'd say I would love to >> start >> > voting in about a week time - so that maybe at the next dev call we can >> > "seal" the subject. Happy to see any more comments - especially from >> those >> > who have opinions but they had no opportunity to express them. >> > >> > I am personally very happy with the direction it took - simplification >> and >> > "MVP" kind of approach - also I invite the stakeholders of ours to take >> a >> > close look at the scope and what we really propose - I have a feeling >> that >> > we can balance it out - there is something we can make to make it not >> > "worse" for the offerings they have. I think we have a really good >> > symbiotic relationship here, and I would love to leverage that. For one >> - >> > my goal here is to have a minimum number of changes that are impacting >> > maintainability of the open-source airflow - but mostly "opening up some >> > possibilities" - rather than provide turn-key solutions. And mostly >> because >> > this is good for all sides - less maintenance and complexity for OSS >> > maintainers, but more opportunities to make it into "turn-key" >> solutions by >> > the stakeholders, while also allowing the "on-prem" users - if they are >> > highly motivated - to use those features by adding the "turn-key" layer >> on >> > their own. Also adding multi-team should not be at the expense of >> "simple" >> > installations - they should be virtually unaffected. >> > >> > One example of applying this is cutting on "separate config files". I >> > think it moves us closer to a "turn-key" solution but it is not really >> > necessary to achieve the three goals above - that's why in the current >> > proposal this part is completely removed - Sorry Niko, but I still think >> > it's one of the things that falls into this bucket. We can easily remove >> > it, they complicate code, documentation and options the users have, and >> > even if it is a "little" more complex to manage configuration by >> motivated >> > users, it's also an opportunity for "turn-key" option that stakeholders >> can >> > build in their products - and we do not have to maintain it in the >> > open-source. So I would be rather strong on **not** touching the current >> > configuration and simply adding configuration for per-team executors in >> > executor config - even if it is uglier and more "low-level". >> > >> > So if there are some constructive ideas on what can be done to make it >> > "simpler" and less "turn-key" in that respect - I would highly value >> such >> > ideas and comments. If we can cut down something more that is not >> > "necessary" for the three primary goals I came up with - I am more than >> > happy to do it. >> > >> > Just to remind - those are the "extracted" goals. I slightly updated >> them >> > and added to the preamble of the AIP: >> > >> > * less operational overhead for managing multi-team (once AIP-72 is >> > complete) where separate execution environments are important >> > * virtual assets sharing between teams >> > * ability of having "admin" and "team sharing" capability where dags >> from >> > multiple teams can be seen in a single Airflow UI (requires custom >> RBAC an >> > AIP-56 implementation of Auth Manager - with KeyCloak Auth Manager >> being a >> > reference implementation) >> > >> > J. >> > >> > >> > On Thu, Jun 26, 2025 at 10:53 AM Jarek Potiuk <ja...@potiuk.com> wrote: >> > >> >> >> >>> One technical observation: Now that the dag table no longer has a >> >>> team_id in it, what would the behaviour be when a DAG is attempted to >> move >> >>> between bundles? How do we detect this? (I’m not all convinced that we >> >>> correctly detect duplicate dag ids across bundles today, so I wouldn’t >> >>> assume or rely on the current behaviour.) >> >>> >> >> >> >> Of course - yes, I realise that - that problem was also not handled in >> >> the previous iteration to be honest. That is something that dag bundle >> >> solution allows to solve eventually - but I do not think it's a >> blocker for >> >> the proposed implementation. We will have to eventually add some way of >> >> blocking dags to jump between bundles, we might tackle this >> separately. I >> >> already wanted to propose a separate update to that - but I did not >> want to >> >> complicate the current proposal. One thing at a time. I can, however - >> if >> >> you consider that as a blocker, extend the current AIP with it. Not a >> big >> >> problem. This is however a bit independent from the team_id >> introduction. >> >> >> >> Overall, I am still unconvinced this proposal has enough real user >> >>> benefit over actually separate deployments, and on balance of the >> added >> >>> complexity and maintenance burden I do not think it is worth it. >> >>> >> >> >> >> That makes me sad, I thought that over the course of the discussion I >> >> addressed all the concerns (in this case the concern was "is it worth >> with >> >> the cost and little benefit", but when I did it and heavily limited the >> >> impact, now the concern is "is it worth at all as changes are really >> >> minimal" - and surely, anyone can change and adapt their concerns, over >> >> time, but that one seems like ever-moving target. I hoped at least for >> >> some acknowledgment of some concerns (complexity in this case) is >> >> addressed, but it seems that you are deeply convinced that we do not >> need >> >> multi-team at all (which is in stark contrast with at least a dozen of >> >> bigger and smaller users of Airflow who submitted talks to Airflow >> summit >> >> (including about 5 or 6 submissions for Airflow 2025) on how they spent >> >> their engineering effort, time and money on trying to achieve something >> >> similar - they assessed that it's worth, you assess that it's not >> worth. >> >> Somehow I trust our users that they were not spending the money, time >> and >> >> engineering effort to achieve this because they wanted to spend more >> money. >> >> I think they assessed it's worth it. So I want to make it a bit easier >> and >> >> more "proper" way for them to do that. >> >> >> >>> >> >>> Upgrades: it is not easier to upgrade under this multi team proposal, >> >>> but much much harder. This is based on hard earned experience from >> helping >> >>> Astronomer users — having to coordinate upgrades between multiple >> teams >> >>> turns in to a months long slog of the hardest kind of work — people >> work: >> >>> getting other teams to agree to do things that they don’t directly >> care >> >>> about — “It’s working for me, I don’t care about upgrading, we’ll get >> to it >> >>> next quarter” is a refrain I’ve heard many times. >> >>> >> >> >> >> Yes. absolutely - this is why we deferred it until we knew what shape >> >> task isolation and other AIPs we depend on take on. Because it is clear >> >> that pretty much all the problem you explain above are going to be >> solved >> >> with task isolation. And it's not just my opinion. If you want to argue >> >> with it, you likely need to argue with yourself: >> >> https://github.com/apache/airflow/issues/51545#issuecomment-2980038478 >> . >> >> Let me quote what you wrote there last week: >> >> >> >> Ash Berlin Taylor wrote: >> >> >> >> > A tight coupling between task-sdk and any "server side" component is >> >> the opposite to one of the goals of AIP-72 (I'm not sure we ever >> explicitly >> >> said this, but the first point of motivation for the AIP says >> "Dependency >> >> conflicts for administrators supporting data teams using different >> versions >> >> of providers, libraries, or python packages") >> >> > In short, my goal with TaskSDK, and the reason for introducing CalVer >> >> and Cadwyn with the execution API is to end up in a world where you can >> >> upgrade the Airflow Scheduler/API server interdependently of any worker >> >> nodes (with the exception that the server must be at least as new as >> the >> >> clients) >> >> > This ability to have version-skew is pretty much non-negotiable to me >> >> and is (other than other languages) one of primary benefits of AIP-72 >> >> >> >> If you read yourself from that quote it basically means "it will be >> easy >> >> to upgrade airflow independently of workers". So I am a bit confused >> here. >> >> Yes, I agree it was difficult, but you yourself explain that when >> AIP-72 >> >> (which since API-67 has been accepted has always beem prerequisite of >> it) >> >> wrote it will be "easy". So I am not sure why you are bringing it now. >> We >> >> assume AIP-72 will be completed and this problem will be gone. Let's >> not >> >> mention it any more please. >> >> >> >> The true separation from TaskSDK will likely only land in about 3.2 >> time >> >>> frame. We are actively working on it, but it’s a slow process of >> untangling >> >>> lots of assumptions made in the code base over the years. Maybe once >> we >> >>> have that my view would be different, but right now I think this >> makes the >> >>> proposal a non-starter. Especially as you are saying that most teams >> will >> >>> have unique connections. If they’ve got those already, then having an >> asset >> >>> trigger use those conns to watch/poll for activity is a much easier >> >>> solution to operate and crucially, to scale and upgrade. >> >>> >> >> >> >> Yes. I perfectly understand that and I am fully aware of potentially >> 3.2 >> >> time-frame. And that's fine. Actually I heartily invite you to listen >> to >> >> the part of my talk from Berlin Buzzwords when I was asked for the >> timeline >> >> - https://youtu.be/EyhZOnbwc-4?t=2226 - this link leads to the exact >> >> timeline in my talk . My answer was basically - "3.1" or "3.2", and I >> >> sincerely hope "3.1" but we might not be able to complete it because we >> >> have other things to do (other - is indeed the Task Isolation work >> that you >> >> are leading). And that's perfectly fine. And it absolutely does not >> prevent >> >> us from voting on the AIP now - similarly as we voted on the previous >> >> version of the AIP - knowing that it has some prerequisites a few >> months >> >> ago. Especially that we know that the feature we need from task >> isolation >> >> is "non-negotiable". I.e. it WILL happen. We don't hope for it, we >> know it >> >> will be there. Those are your own words. >> >> >> >> >> >>> > I think we can’t compare AIP-82 to sharing virtual assets due to >> >>> complexity of it. >> >>> >> >>> Virtual Assets was a mistake, and not how users actually want to use >> >>> them. Mea culpa >> >>> >> >> >> >> This is the first time I hear this - certainly you never raised this >> >> concern on the devlist. So if you have some concerns about virtual >> assets I >> >> think you should raise it on the devlist, because I think everyone >> here is >> >> missing some conversation (or maybe it's just your private opinion that >> >> you never shared with anyone, but maybe it's worth). I would be >> >> interested to hear how the feature that was absolutely most successful >> >> feature of airflow 2 was a mistake. According to the 2024 survey >> >> https://airflow.apache.org/blog/airflow-survey-2024/ - 48% of Airflow >> >> users have been using it, even if it was added as one of the last >> >> big features of Airflow 2. It's the MOST used feature out of all the >> >> features out there. I would be really curious to see how it was a >> mistake >> >> (but please start a separate thread explaining why you think it was a >> >> mistake, what are your data points and what do you think should be >> fixed. >> >> Just dropping "virtual assets were a mistake" in the middle of >> multi-team >> >> conversation seems completely unjustified without knowing what you are >> >> talking about. So I think, until we know more, this argument has no >> base. >> >> >> >> >> >>> >> >>> S >> >>> To restate my points: >> >>> >> >>> - Sharing a deployment between teams today/in 3.1 is operationally >> more >> >>> complex (both scaling, and upgrades) — this is a con, not a plus. >> >>> >> >> >> >> Surely. But it will be easier when AIP-72 is complete (which I am >> >> definitely looking forward to and as clearly explained in AIP-82, is a >> >> prerequisite of it). Nothing changed here. >> >> >> >> >> >>> - The main user benefit appears to be “allow teams’ DAGs to >> communicate >> >>> via Assets”, in which case we can do that today by putting more work >> in to >> >>> AIP-82’s Asset triggers >> >>> >> >> >> >> No. Lower operational complexity for multi-teams (providing that we >> >> deliver AIP-72) is another benefit. Virtual assets is another, and >> since >> >> there is no ground in "virtual assets is a mistake" statement (not >> until >> >> you explain what you mean by that in a separate discussion) - this is >> also >> >> still a very valid point. >> >> >> >> >> >>> Soon, we will have then be asked about cross-team governance, policy >> >>> enforcement, and potentially unbounded edge cases (e.g., team-specific >> >>> secrets, roles, quotas). ain, you get this for free with truely >> separate >> >>> deployments already >> >>> allow different teams to use different executors (including multiple >> >>> executors per-team following AIP-61) >> >>> >> >> >> >> Not really. We very explicitly say in the AIP that his is not a goal >> and >> >> that we have no plans for. And yes, using separate executors per team >> is >> >> actually back in the AIP-82 in case you did not notice (and the code >> needed >> >> for it's even implemented and merged already in main by Vincent). >> >> >> >> >> >>> Provably not true right now, and until ~3.2 delivers the full Task >> >>> SDK/Core dependency separation this would be _more_ work to upgrade, >> not >> >>> less, and that work is not shared but still on a central team. >> >>> >> >> >> >> Absolutely - we will wait for AIP-72 completion. I do not want to say >> 3.1 >> >> or 3.2 directly - because there are - as you said - a lot of moving >> pieces. >> >> So my target for multi-team is "After AIP-72 is completed". Full stop. >> But >> >> there is nothing wrong with accepting the AIP now and doing preparatory >> >> work in parallel. Similarly as there is no way to have a baby in 1 >> month by >> >> 9 women, there is no way adding more effort to task-sdk isolation will >> >> speed it up - we alredy have not only 3 people (you leading it, Kaxil >> and >> >> Amog) but also all the help from me and even 10s of different >> contributors >> >> (for example with the recent db_test cleanup that I took leadership >> on) - >> >> and there are people who wish to work on adding multi-team features. >> Since >> >> the design heavily limits impact on airflow codebase and interactions >> with >> >> task-sdk implementation, there is nothing wrong with starting >> >> implementation in parallel either- amazon team is keen to move it >> forward - >> >> they even already implemented SQS trigger for assets, and we are >> working >> >> together on FAB removal, Keycloak authentication manager - and they >> seem to >> >> still have capacity and drive to progress multi-team. So I am not sure >> if >> >> we are trading off something. There is no "if we work on more on task >> sdk >> >> and drop multi-team things will be faster". Generally in open source >> people >> >> work in the area where they feel they can provide best value - such as >> you >> >> working on task-sdk, me on CI,dev env, they will deliver more value on >> >> multi-team >> >> >> >> >> >>> >> >>> So please, as succinctly as possible, please tell me what the direct >> >>> benefit to users this proposal is over us putting this effort in to >> writing >> >>> better Asset triggers instead? >> >>> >> >> >> >> >> >> * less operational overhead for managing multi-team (once AIP-72 is >> >> complete) where separate execution environments are important >> >> * virtual assets sharing >> >> * ability of having "admin" and "team sharing" capability where dags >> from >> >> multiple teams can be seen in a single Airflow UI (requires custom >> RBAC) >> >> >> >> None of this can be done via beter asset triggers >> >> >> >> >> >>> >> >>> > On 23 Jun 2025, at 10:57, Jarek Potiuk <ja...@potiuk.com> wrote: >> >>> > >> >>> > My counter-points: >> >>> > >> >>> > >> >>> >> 1. Managing a multi team deployment is not materially different >> from >> >>> >> managing a deployment per team >> >>> >> >> >>> > >> >>> > It's a bit easier - especially when it comes to upgrades (especially >> >>> in the >> >>> > case we are targetting when we are not targetting multi-tenant, but >> >>> several >> >>> > relatively closely cooperating teams with different dependncy >> >>> requiremens >> >>> > and isolation need. >> >>> > >> >>> > 2. The database changes were quite wide-reaching >> >>> >> >> >>> > >> >>> > Yes. that is addressed. >> >>> > >> >>> > >> >>> >> 3. I don’t believe the original AIP (again, I haven’t read the >> updated >> >>> >> proposal or recent messages on the thread. yet) will meet what many >> >>> users >> >>> >> want out of a multiteam solution >> >>> >> >> >>> > >> >>> > I think we will only see when we try. A lot of people thing they >> would, >> >>> > even if they are warned. I know at least one user (Wealthsimple) who >> >>> > definitely want to use it and they got a very detailed explanation >> of >> >>> the >> >>> > idea and understand it well. So I am sure that **some** users would. >> >>> But we >> >>> > do not know how many. >> >>> > >> >>> > >> >>> >> To expand on those points a bit more >> >>> >> >> >>> >> On 1. The only components that are shared are, I think, the >> scheduler >> >>> and >> >>> >> the API server, and it’s arguable if that is actually a good idea >> >>> given >> >>> >> those are likely to be the most performance sensitive components >> >>> anyway. >> >>> >> >> >>> >> Additionally the fact that the scheduler is a shared component >> makes >> >>> >> upgrading it almost a non starter as you would likely need buy-in, >> >>> changes, >> >>> >> and testing form ALL teams using it. I’d argue that this is a huge >> >>> negative >> >>> >> until we finish off the version indepence work of AIP-72. >> >>> >> >> >>> > >> >>> > Quite disagree here - especially that our target is that task-sdk is >> >>> > supposed to provide all isolation that is needed. There should be 0 >> >>> changes >> >>> > in the dags needed to upgrade scheduler, api_server, triggerer - >> >>> precisely >> >>> > because we introduced backwards-compatible task-sdk. >> >>> > >> >>> > On 3 my complaint is essentially that this doesn’t go nearly far >> >>> enough. It >> >>> >> doesn’t allow read only views to other teams dags. I don’t think it >> >>> allows >> >>> >> you to be in multiple teams at once. You can’t share a connection >> >>> between >> >>> >> teams but only allow certain specified dags to access it, but would >> >>> have to >> >>> >> either be globally usable, or duplicated-and-kept-in-sync between >> >>> teams. In >> >>> >> short I think it fall short of being useful.. >> >>> >> >> >>> > >> >>> > Oh absolutely all that is possible (except sharing single >> connections >> >>> > between multiple teams - which is a very niche use cases and >> >>> duplication >> >>> > here is perfectly ok as first approximation - and if we need more we >> >>> can >> >>> > add it later). >> >>> > >> >>> > Auth manager RBAC and access is abstracted away, and the Keyclock >> >>> Manager >> >>> > implemented by Vincent allows to manage completely independent and >> >>> separate >> >>> > RBAC based on arguments and resources provided by Airflow. There is >> >>> nothing >> >>> > to prevent the user who configures KeyCloak RBAC to define it in the >> >>> way: >> >>> > >> >>> > if group a > allow to read a and write b >> >>> > if group b > alllow to write b but not a >> >>> > >> >>> > and any other combinations. KeyCloak implementation - pretty >> advanced >> >>> > already - (and design of auth manager) completely abstracts away >> both >> >>> > authentication and authorization to KeyCloak and KeyCloak has RBAC >> >>> > management built in. Also any of the users can write their own - >> even >> >>> > hard-coded authentication manager to do the same if they do not >> want to >> >>> > have configurable KeyCloak. Even SimpleAuthManager could be >> hard-coded >> >>> to >> >>> > provide thiose features. >> >>> > >> >>> > >> >>> >> >> >>> >> So on the surface, I’m no more in favour of using dag bundle as a >> >>> >> replacement for team id as I think most of the above points still >> >>> stand. >> >>> >> >> >>> > >> >>> > We disagree here. >> >>> > >> >>> >> >> >>> >> My counter proposal: We do _nothing_ to core airflow. We work on >> >>> improving >> >>> >> the event-based trigger o fdags (write more triggers for read/check >> >>> remote >> >>> >> Assets etc) so that teams can have 100% isolated deployments but >> still >> >>> >> trigger dags based on asset events from other teams. >> >>> >> >> >>> > >> >>> > That does not solve any of the other design goals - only allows to >> >>> trigger >> >>> > assets a bit more easily (but also it's not entirely solved by >> AIP-82 >> >>> > because it does not solve virtual assets - only ones that have >> defined >> >>> > triggerer and "something" to listen on - which is way more complex >> than >> >>> > just defining asset in a Dag and using it in another). I think we >> can't >> >>> > compare AIP-82 to sharing virtual assets due to complexity of it. I >> >>> > explained it in the doc. >> >>> > >> >>> > >> >>> > I will now go and catch up with the long thread and updated proposal >> >>> and >> >>> >> come back. >> >>> >> >> >>> > >> >>> > Please. I hope the above explaination will help in better >> >>> understanding of >> >>> > the proposal, because I think you had some assumptions that do not >> >>> hold any >> >>> > more with the new proposal. >> >>> > >> >>> > J. >> >>> > >> >>> > >> >>> >> >> >>> >>> On 23 Jun 2025, at 05:54, Jarek Potiuk <ja...@potiuk.com> wrote: >> >>> >>> >> >>> >>> Just to clarify the relation - I updated the AIP now to refer to >> >>> AIP-82 >> >>> >> and >> >>> >>> to explain relation between the "cross-team" and "cross-airflow" >> >>> asset >> >>> >>> triggering - this is what I added: >> >>> >>> >> >>> >>> Note that there is a relation between AIP-82 ("External Driven >> >>> >> Scheduling") >> >>> >>> and this part of the functionality. When you have multiple >> instances >> >>> of >> >>> >>> Airflow, you can use shared datasets - "Physical datasets" - that >> >>> several >> >>> >>> Airflow Instances can use - for example there could be an S3 >> object >> >>> that >> >>> >> is >> >>> >>> produced by one airflow instance, and consumed by another. That >> >>> requires >> >>> >>> deferred trigger to monitor for such datasets, and appropriate >> >>> >> permissions >> >>> >>> to the external dataset, and you could achive similar result to >> >>> >> cross-team >> >>> >>> dataset triggering (but cross airflow). However the feature of >> >>> sharing >> >>> >>> datasets between the teams also works for virtual assets, that do >> not >> >>> >> have >> >>> >>> physically shared "objects" and trigger that is monitoring for >> >>> changes in >> >>> >>> such asset. >> >>> >>> >> >>> >>> J. >> >>> >>> >> >>> >>> >> >>> >>> On Mon, Jun 23, 2025 at 6:38 AM Jarek Potiuk <ja...@potiuk.com> >> >>> wrote: >> >>> >>> >> >>> >>>>> From a quick glance, the updated AIP didn't seem to have any >> >>> reference >> >>> >> to >> >>> >>>>> AIP-82, which surprised me, but will take a more detailed read >> >>> through. >> >>> >>>> >> >>> >>>> Yep. It did not - because I did not think it was needed or even >> very >> >>> >>>> important after the simplifications. AIP-82 has a different >> scope, >> >>> >> really. >> >>> >>>> It only helps when the Assets are "real" data files which we have >> >>> >> physical >> >>> >>>> triggers for, it's slightly related - sharing datasets between >> teams >> >>> >>>> (including those that do not require physical files and >> triggers) is >> >>> >> still >> >>> >>>> possible in the design we have now, but it's not (and never was) >> the >> >>> >>>> **only** reason for having multi-team. There always was (and >> still >> >>> is) >> >>> >> the >> >>> >>>> possibility of having a common, distinct environments (i.e. >> >>> dependencies >> >>> >>>> and providers) per team, the possibility of having connections >> and >> >>> >>>> variables that are only accessible to one team and not the other, >> >>> and >> >>> >>>> isolating workload execution (all that while allowing to manage >> >>> multiple >> >>> >>>> team and schedule things with single deployment). That did not >> >>> change. >> >>> >> What >> >>> >>>> changed a lot is that it is now way simpler, something that we >> can >> >>> >>>> implement without heavy changes to the codebase - and give it to >> our >> >>> >> users, >> >>> >>>> so that they can assess if this is something they need without >> too >> >>> much >> >>> >>>> risk and effort. >> >>> >>>> >> >>> >>>> This was - I believe the main concern, that the value we get from >> >>> it is >> >>> >>>> not dramatic, but the required changes are huge. This "redesign" >> >>> changes >> >>> >>>> the equation - the value is still unchanged, but the cost of >> >>> >> implementing >> >>> >>>> it and impact on the Airflow codebase is much smaller. I still >> have >> >>> not >> >>> >>>> heard back from Ash if my proposal responds to his original >> concern >> >>> >> though, >> >>> >>>> so I am mostly guessing (also based on the positive impact of >> >>> others) >> >>> >> that >> >>> >>>> yes it does. But to be honest I am not sure and I would love to >> hear >> >>> >> back, >> >>> >>>> I decided to update the AIP to reflect it - regardless, because I >> >>> think >> >>> >> the >> >>> >>>> simplification I proposed keeps the original goals, but is indeed >> >>> way >> >>> >>>> simpler. >> >>> >>>> >> >>> >>>>> This is a very difficult thread to catch up on. >> >>> >>>> >> >>> >>>> Valid point. Let me summarize what is the result: >> >>> >>>> >> >>> >>>> * I significantly simplified the implementation proposal >> comparing >> >>> to >> >>> >> the >> >>> >>>> original version >> >>> >>>> * main simplification is very limited impact on existing >> database - >> >>> >>>> without "ripple effect" that would require us to change a lot of >> >>> tables, >> >>> >>>> including their primary keys, and heavily impact the UI >> >>> >>>> * this is now more of an incremental change that can be >> implemented >> >>> way >> >>> >>>> faster and with far less risk >> >>> >>>> * updated idea is based on leveraging bundles (already part of >> our >> >>> data >> >>> >>>> model) to map them (many-to-one) to a team - which requires to >> just >> >>> >> extend >> >>> >>>> the data model with bundle mapping and add team_id to connections >> >>> and >> >>> >>>> variables. Those are all needed DB changes. >> >>> >>>> >> >>> >>>> The AIP is updated - in a one single big change so It should be >> >>> easy to >> >>> >>>> compare the changes: >> >>> >>>> >> >>> >> >> >>> >> https://cwiki.apache.org/confluence/pages/viewpreviousversions.action?pageId=294816378 >> >>> >>>> -> I even named the version appropriately "Simplified multi-team >> >>> AIP" - >> >>> >> you >> >>> >>>> can select and compare v.65 with v.66 to see the exact >> differences I >> >>> >>>> proposed. >> >>> >>>> >> >>> >>>> I hope it will be helpful to catch up and for those who did not >> >>> follow, >> >>> >> to >> >>> >>>> be able to make up their minds about it. >> >>> >>>> >> >>> >>>> J. >> >>> >>>> >> >>> >>>> >> >>> >>>> >> >>> >>>> On Mon, Jun 23, 2025 at 4:35 AM Vikram Koka >> >>> >> <vik...@astronomer.io.invalid> >> >>> >>>> wrote: >> >>> >>>> >> >>> >>>>> This is a very difficult thread to catch up on. >> >>> >>>>> I will take a detailed look at the AIP update to try to figure >> out >> >>> the >> >>> >>>>> changes in the proposal. >> >>> >>>>> >> >>> >>>>> From a quick glance, the updated AIP didn't seem to have any >> >>> reference >> >>> >> to >> >>> >>>>> AIP-82, which surprised me, but will take a more detailed read >> >>> through. >> >>> >>>>> >> >>> >>>>> Vikram >> >>> >>>>> >> >>> >>>>> >> >>> >>>>> >> >>> >>>>> On Sun, Jun 22, 2025 at 1:44 AM Pavankumar Gopidesu < >> >>> >>>>> gopidesupa...@gmail.com> >> >>> >>>>> wrote: >> >>> >>>>> >> >>> >>>>>> Thanks Jarek, that's a great update on this AIP, now it's much >> >>> more >> >>> >> slim >> >>> >>>>>> down. >> >>> >>>>>> >> >>> >>>>>> left a minor comment. :) Overall looking great. >> >>> >>>>>> >> >>> >>>>>> Pavan >> >>> >>>>>> >> >>> >>>>>> On Sat, Jun 21, 2025 at 3:10 PM Jens Scheffler >> >>> >>>>> <j_scheff...@gmx.de.invalid >> >>> >>>>>>> >> >>> >>>>>> wrote: >> >>> >>>>>> >> >>> >>>>>>> Thanks for the rework/update of the AIP-72! >> >>> >>>>>>> >> >>> >>>>>>> Just a few small comments but overall I like it as it is much >> >>> leaner >> >>> >>>>>>> than originally planned and is in a level of complexity that >> it >> >>> >> really >> >>> >>>>>>> seems to be a benefit to close the gap as described. >> >>> >>>>>>> >> >>> >>>>>>> On 21.06.25 14:52, Jarek Potiuk wrote: >> >>> >>>>>>>> I updated the AIP - including architecture images and >> reviewed >> >>> it >> >>> >>>>>> (again) >> >>> >>>>>>>> and corrected any ambiguities and places where it needed to >> be >> >>> >>>>> changed. >> >>> >>>>>>>> >> >>> >>>>>>>> I think the current state >> >>> >>>>>>>> >> >>> >>>>>>> >> >>> >>>>>> >> >>> >>>>> >> >>> >> >> >>> >> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-67+Multi-team+deployment+of+Airflow+components >> >>> >>>>>>>> - nicely describes the proposal. >> >>> >>>>>>>> >> >>> >>>>>>>> Comparing to the previous one: >> >>> >>>>>>>> >> >>> >>>>>>>> 1. The DB changes are far less intrusive - no ripple effect >> on >> >>> >>>>> Airflow >> >>> >>>>>>>> 2. There is no need to merge configurations and provide >> >>> different >> >>> >>>>> set >> >>> >>>>>> of >> >>> >>>>>>>> configs per team - we can add it later but I do not see why >> we >> >>> need >> >>> >>>>> it >> >>> >>>>>> in >> >>> >>>>>>>> this simplified version >> >>> >>>>>>>> 3. We can still configure a different set of executors per >> team >> >>> - >> >>> >>>>> that >> >>> >>>>>> is >> >>> >>>>>>>> already implemented (we just need to wire it to the bundle -> >> >>> team >> >>> >>>>>>> mapping). >> >>> >>>>>>>> >> >>> >>>>>>>> I think it will be way simpler and faster to implement this >> way >> >>> and >> >>> >>>>> it >> >>> >>>>>>>> should serve as MVMT -> Minimum Viable Multi Team that we can >> >>> give >> >>> >>>>> our >> >>> >>>>>>>> users so that they can provide feedback. >> >>> >>>>>>>> >> >>> >>>>>>>> J. >> >>> >>>>>>>> >> >>> >>>>>>>> >> >>> >>>>>>>> >> >>> >>>>>>>> >> >>> >>>>>>>> On Fri, Jun 20, 2025 at 8:33 AM Jarek Potiuk < >> ja...@potiuk.com> >> >>> >>>>> wrote: >> >>> >>>>>>>> >> >>> >>>>>>>>> >> >>> >>>>>>>>> >> >>> >>>>>>>>>> I like this iteration a bit more now for sure, thanks for >> >>> being >> >>> >>>>>>> receptive >> >>> >>>>>>>>>> to feedback! :) >> >>> >>>>>>>>>> >> >>> >>>>>>>>> >> >>> >>>>>>>>>> This now becomes quite close to what was proposing before, >> we >> >>> now >> >>> >>>>>> again >> >>> >>>>>>>>>> have a team ID (which I think is really needed here, glad >> to >> >>> see >> >>> >>>>> it >> >>> >>>>>>> back) >> >>> >>>>>>>>>> and it will be used for auth management, configuration >> >>> >>>>> specification, >> >>> >>>>>>> etc >> >>> >>>>>>>>>> but will be carried by Bundle instead of the dag model. >> Which >> >>> as >> >>> >>>>> you >> >>> >>>>>>> say >> >>> >>>>>>>>>> “For that we will need to make sure that both api-server, >> >>> >>>>> scheduler >> >>> >>>>>> and >> >>> >>>>>>>>>> triggerer have access to the "bundle definition" (to >> perform >> >>> the >> >>> >>>>>>> mapping)" >> >>> >>>>>>>>>> which honestly doesn’t feel too much different from the >> >>> original >> >>> >>>>>>> proposal >> >>> >>>>>>>>>> we had last week of adding it to Dag table and ensuring >> it’s >> >>> >>>>>> available >> >>> >>>>>>>>>> everywhere. but either way I’m happy to meet in the middle >> and >> >>> >>>>> keep >> >>> >>>>>> it >> >>> >>>>>>> on >> >>> >>>>>>>>>> Bundle if everyone else feels that’s a more suitable >> location. >> >>> >>>>>>>>>> >> >>> >>>>>>>>> I think the big difference is the "ripple effect" that was >> >>> >>>>> discussed >> >>> >>>>>> in >> >>> >>>>>>>>> >> >>> https://lists.apache.org/thread/78vndnybgpp705j6sm77l1t6xbrtnt5c >> >>> >>>>>> (and I >> >>> >>>>>>>>> believe - correct me if I am wrong Ash - important trigger >> for >> >>> the >> >>> >>>>>>>>> discussion) so far what we wanted is to extend the primary >> key >> >>> and >> >>> >>>>> it >> >>> >>>>>>> would >> >>> >>>>>>>>> ripple through all the pieces of Airflow -> models, API, UI >> >>> etc. >> >>> >>>>> ... >> >>> >>>>>>>>> However - we already have `bundle_name" and "bundle_version" >> >>> in the >> >>> >>>>>> Dag >> >>> >>>>>>>>> model. So I think when we add a separate table where we map >> the >> >>> >>>>> bundle >> >>> >>>>>>> to >> >>> >>>>>>>>> the team, the "ripple effect" will be almost 0. We do not >> want >> >>> to >> >>> >>>>>> change >> >>> >>>>>>>>> primary key, we do not want to change UI in any way (except >> >>> >>>>> filtering >> >>> >>>>>> of >> >>> >>>>>>>>> DAGs available based on your team - but that will be >> handled in >> >>> >>>>> Auth >> >>> >>>>>>>>> Manager and will not impact UI in any way, I think that's a >> >>> huge >> >>> >>>>>>>>> simplification of the implementation, and if we agree to it >> - i >> >>> >>>>> think >> >>> >>>>>> it >> >>> >>>>>>>>> should speed up the implementation significantly. There are >> >>> only a >> >>> >>>>>>> limited >> >>> >>>>>>>>> number of times where you need to look up the team_id - so >> >>> having >> >>> >>>>> the >> >>> >>>>>>>>> bundle -> team mapping in a separate table and having to >> look >> >>> them >> >>> >>>>> up >> >>> >>>>>>>>> should not be a problem. And it has much less complexity and >> >>> >>>>>>>>> "ripple-effect" through the codebase (for example I could >> >>> imagine >> >>> >>>>> 100s >> >>> >>>>>>> or >> >>> >>>>>>>>> thousands already written tests that would have to be >> adapted >> >>> if we >> >>> >>>>>>> changed >> >>> >>>>>>>>> the primary key - where there will be pretty much zero >> impact >> >>> on >> >>> >>>>>>> existing >> >>> >>>>>>>>> tests if we just add bundle -> team lookup table. >> >>> >>>>>>>>> >> >>> >>>>>>>>> >> >>> >>>>>>>>>> One other thing I’d point out is that I think including >> >>> executors >> >>> >>>>> per >> >>> >>>>>>>>>> team is a very easy win and quite possible without much >> work. >> >>> I >> >>> >>>>>> already >> >>> >>>>>>>>>> have much of the code written. Executors are already aware >> of >> >>> >>>>> Teams >> >>> >>>>>>> that >> >>> >>>>>>>>>> own them (merged), I have a PR open to have configuration >> per >> >>> team >> >>> >>>>>>> (with a >> >>> >>>>>>>>>> quite simple and isolated approach, I believe you approved >> >>> Jarek). >> >>> >>>>>> The >> >>> >>>>>>> last >> >>> >>>>>>>>>> piece is updating the scheduling logic to route tasks from >> a >> >>> >>>>>> particular >> >>> >>>>>>>>>> Bundle to the correct executor, which shouldn’t be much >> work >> >>> >>>>> (though >> >>> >>>>>> it >> >>> >>>>>>>>>> would be easier if the Task models had a column for the >> team >> >>> they >> >>> >>>>>>> belong >> >>> >>>>>>>>>> to, rather than having to look up the Dag and Bundle to get >> >>> the >> >>> >>>>>> team) I >> >>> >>>>>>>>>> have a branch where I was experimenting with this logic >> >>> already. >> >>> >>>>>>>>>> Any who, long story short, I don’t think we necessarily >> need >> >>> to >> >>> >>>>>> remove >> >>> >>>>>>>>>> this piece from the project's scope if it is already partly >> >>> done >> >>> >>>>> and >> >>> >>>>>>> not >> >>> >>>>>>>>>> too difficult. >> >>> >>>>>>>>>> >> >>> >>>>>>>>> Yeah. I hear you here again. Certainly I would not want to >> just >> >>> >>>>>>>>> **remove** it from the code. And, yep I totally forgot we >> have >> >>> it >> >>> >>>>> in. >> >>> >>>>>>> And >> >>> >>>>>>>>> if we can make it in, easily (which it seems we can) - we >> can >> >>> also >> >>> >>>>>>> include >> >>> >>>>>>>>> it in the first iteration. What I wanted to avoid really >> (from >> >>> the >> >>> >>>>>>> original >> >>> >>>>>>>>> design) - again trying to simplify it, limit the changes, >> and >> >>> >>>>> speed up >> >>> >>>>>>>>> implementation. And there is one "complexity" that I wanted >> to >> >>> >>>>> avoid >> >>> >>>>>>>>> specifically - having to have separate , additional >> >>> configuration >> >>> >>>>> per >> >>> >>>>>>> team. >> >>> >>>>>>>>> Not only because it complicates already complex >> configuration >> >>> >>>>> handling >> >>> >>>>>>> (I >> >>> >>>>>>>>> know we have PR for that) but mostly because if it is not >> >>> needed, >> >>> >>>>> we >> >>> >>>>>> can >> >>> >>>>>>>>> simplify documentation and explain to our users easier what >> >>> they >> >>> >>>>> need >> >>> >>>>>>> to do >> >>> >>>>>>>>> to have their own multi-team setup. And I am quite open to >> >>> keeping >> >>> >>>>>>>>> multiple-executors if we can avoid complicating >> configuration. >> >>> >>>>>>>>> >> >>> >>>>>>>>> But I think some details of that and whether we really need >> >>> >>>>> separate >> >>> >>>>>>>>> configuration might also come as a result of updating the >> AIP >> >>> - I >> >>> >>>>> am >> >>> >>>>>> not >> >>> >>>>>>>>> quite sure now if we need it, but we can discuss it when we >> >>> >>>>> iterate on >> >>> >>>>>>> the >> >>> >>>>>>>>> AIP. >> >>> >>>>>>>>> >> >>> >>>>>>>>> J. >> >>> >>>>>>>>> >> >>> >>>>>>>>> >> >>> >>>>>>> >> >>> >>>>>>> >> >>> --------------------------------------------------------------------- >> >>> >>>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >> >>> >>>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org >> >>> >>>>>>> >> >>> >>>>>>> >> >>> >>>>>> >> >>> >>>>> >> >>> >>>> >> >>> >> >> >>> >> >> >>> >> >> --------------------------------------------------------------------- >> >>> >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >> >>> >> For additional commands, e-mail: dev-h...@airflow.apache.org >> >>> >> >>> >> >