Re: Discuss: AIP-67 (multi team) now that AIP-82 (External event driven dags) exists

Jarek Potiuk Tue, 01 Jul 2025 22:27:25 -0700

Any last comments ? There is a long weekend coming up in the US, so I will
likely start voting on the updated AIP on Monday 7th.


On Fri, Jun 27, 2025 at 12:41 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> I'd really love to finalise discussion and put it up to a vote some time
> after the recording from the last dev call is posted - so that more
> context, details and the LONG discussion we had on it. There is no *huge*
> hurry  - we have strong dependency on Task Isolation and it seems that it
> will still take a bit of time to complete, so I'd say I would love to start
> voting in about a week time - so that maybe at the next dev call we can
> "seal" the subject. Happy to see any more comments - especially from those
> who have opinions but they had no opportunity to express them.
>
> I am personally very happy with the direction it took - simplification and
> "MVP" kind of approach - also I invite the stakeholders of ours to take a
> close look at the scope and what we really propose - I have a feeling that
> we can balance it out - there is something we can make to make it not
> "worse" for the offerings they have. I think we have a really good
> symbiotic relationship here, and I would love to leverage that. For one -
> my goal here is to have a minimum number of changes that are impacting
> maintainability of the open-source airflow - but mostly "opening up some
> possibilities" - rather than provide turn-key solutions. And mostly because
> this is good for all sides - less maintenance and complexity for OSS
> maintainers, but more opportunities to make it into "turn-key" solutions by
> the stakeholders, while also allowing the "on-prem" users - if they are
> highly motivated - to use those features by adding the "turn-key" layer on
> their own. Also adding multi-team should not be at the expense of "simple"
> installations - they should be virtually unaffected.
>
> One example of applying this is cutting on "separate config files". I
> think it moves us closer to a "turn-key" solution but it is not really
> necessary to achieve the three goals above - that's why in the current
> proposal this part is completely removed - Sorry Niko, but I still think
> it's one of the things that falls into this bucket. We can easily remove
> it, they complicate code, documentation and options the users have, and
> even if it is a "little" more complex to manage configuration by motivated
> users, it's also an opportunity for "turn-key" option that stakeholders can
> build in their products - and we do not have to maintain it in the
> open-source. So I would be rather strong on **not** touching the current
> configuration and simply adding configuration for per-team executors in
> executor config - even if it is uglier and more "low-level".
>
> So if there are some constructive ideas on what can be done to make it
> "simpler" and less "turn-key" in that respect - I would highly value such
> ideas and comments. If we can cut down something more that is not
> "necessary" for the three primary goals I came up with - I am more than
> happy to do it.
>
> Just to remind - those are the "extracted" goals. I slightly updated them
> and added to the preamble of the AIP:
>
> * less operational overhead for managing multi-team (once AIP-72 is
> complete) where separate execution environments are important
> * virtual assets sharing between teams
> * ability of having "admin" and "team sharing" capability where dags from
> multiple teams can be seen in a single Airflow UI  (requires custom RBAC an
> AIP-56 implementation of Auth Manager - with KeyCloak Auth Manager being a
> reference implementation)
>
> J.
>
>
> On Thu, Jun 26, 2025 at 10:53 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>>
>>> One technical observation: Now that the dag table no longer has a
>>> team_id in it, what would the behaviour be when a DAG is attempted to move
>>> between bundles? How do we detect this? (I’m not all convinced that we
>>> correctly detect duplicate dag ids across bundles today, so I wouldn’t
>>> assume or rely on the current behaviour.)
>>>
>>
>> Of course - yes, I realise that - that problem was also not handled in
>> the previous iteration to be honest. That is something that dag bundle
>> solution allows to solve eventually - but I do not think it's a blocker for
>> the proposed implementation. We will have to eventually add some way of
>> blocking dags to jump between bundles, we might tackle this separately. I
>> already wanted to propose a separate update to that - but I did not want to
>> complicate the current proposal. One thing at a time. I can, however - if
>> you consider that as a blocker, extend the current AIP with it. Not a big
>> problem. This is however a bit independent from the team_id introduction.
>>
>> Overall, I am still unconvinced this proposal has enough real user
>>> benefit over actually separate deployments, and on balance of the added
>>> complexity and maintenance burden I do not think it is worth it.
>>>
>>
>> That makes me sad, I thought that over the course of the discussion I
>> addressed all the concerns (in this case the concern was "is it worth with
>> the cost and little benefit", but when I did it and heavily limited the
>> impact, now the concern is "is it worth at all as changes are really
>> minimal" - and surely, anyone can change and adapt their concerns, over
>> time,  but that one seems like ever-moving target. I hoped at least for
>> some acknowledgment of some concerns (complexity in this case) is
>> addressed, but it seems that you are deeply convinced that we do not need
>> multi-team at all (which is in stark contrast with at least a dozen of
>> bigger and smaller users of Airflow who submitted talks to Airflow summit
>> (including about 5 or 6 submissions for Airflow 2025) on how they spent
>> their engineering effort, time and money on trying to achieve something
>> similar - they assessed that it's worth, you  assess that it's not worth.
>> Somehow I trust our users that they were not spending the money, time and
>> engineering effort to achieve this because they wanted to spend more money.
>> I think they assessed it's worth it. So I want to make it a bit easier and
>> more "proper" way for them to do that.
>>
>>>
>>> Upgrades: it is not easier to upgrade under this multi team proposal,
>>> but much much harder. This is based on hard earned experience from helping
>>> Astronomer users — having to coordinate upgrades between multiple teams
>>> turns in to a months long slog of the hardest kind of work —  people work:
>>> getting other teams to agree to do things that they don’t directly care
>>> about — “It’s working for me, I don’t care about upgrading, we’ll get to it
>>> next quarter” is a refrain I’ve heard many times.
>>>
>>
>> Yes. absolutely - this is why we deferred it until we knew what shape
>> task isolation and other AIPs we depend on take on. Because it is clear
>> that pretty much all the problem you explain above are going to be solved
>> with task isolation. And it's not just my opinion. If you want to argue
>> with it, you likely need to argue with yourself:
>> https://github.com/apache/airflow/issues/51545#issuecomment-2980038478.
>> Let me quote what you wrote there last week:
>>
>> Ash Berlin Taylor wrote:
>>
>> > A tight coupling between task-sdk and any "server side" component is
>> the opposite to one of the goals of AIP-72 (I'm not sure we ever explicitly
>> said this, but the first point of motivation for the AIP says "Dependency
>> conflicts for administrators supporting data teams using different versions
>> of providers, libraries, or python packages")
>> > In short, my goal with TaskSDK, and the reason for introducing CalVer
>> and Cadwyn with the execution API is to end up in a world where you can
>> upgrade the Airflow Scheduler/API server interdependently of any worker
>> nodes (with the exception that the server must be at least as new as the
>> clients)
>> > This ability to have version-skew is pretty much non-negotiable to me
>> and is (other than other languages) one of primary benefits of AIP-72
>>
>> If you read yourself from that quote it basically means "it will be easy
>> to upgrade airflow independently of workers". So I am a bit confused here.
>> Yes, I agree it was difficult, but you yourself explain that when AIP-72
>> (which since API-67 has been accepted has always beem prerequisite of it)
>> wrote it will be "easy". So I am not sure why you are bringing it now. We
>> assume AIP-72 will be completed and this problem will be gone. Let's not
>> mention it any more please.
>>
>> The true separation from TaskSDK will likely only land in about 3.2 time
>>> frame. We are actively working on it, but it’s a slow process of untangling
>>> lots of assumptions made in the code base over the years. Maybe once we
>>> have that my view would be different, but right now I think this makes the
>>> proposal a non-starter. Especially as you are saying that most teams will
>>> have unique connections. If they’ve got those already, then having an asset
>>> trigger use those conns to watch/poll for activity is a much easier
>>> solution to operate and crucially, to scale and upgrade.
>>>
>>
>> Yes. I perfectly understand that and I am fully aware of potentially 3.2
>> time-frame. And that's fine. Actually I heartily invite you to listen to
>> the part of my talk from Berlin Buzzwords when I was asked for the timeline
>> - https://youtu.be/EyhZOnbwc-4?t=2226 - this link leads to the exact
>> timeline in my talk . My answer was basically - "3.1" or "3.2", and I
>> sincerely hope "3.1" but we might not be able to complete it because we
>> have other things to do (other - is indeed the Task Isolation work that you
>> are leading). And that's perfectly fine. And it absolutely does not prevent
>> us from voting on the AIP now - similarly as we voted on the previous
>> version of the AIP - knowing that it has some prerequisites a few months
>> ago. Especially that we know that the feature we need from task isolation
>> is "non-negotiable". I.e. it WILL happen. We don't hope for it, we know it
>> will be there. Those are your own words.
>>
>>
>>> >  I think we can’t compare AIP-82 to sharing virtual assets due to
>>> complexity of it.
>>>
>>> Virtual Assets was a mistake, and not how users actually want to use
>>> them. Mea culpa
>>>
>>
>> This is the first time I hear this - certainly you never raised this
>> concern on the devlist. So if you have some concerns about virtual assets I
>> think you should raise it on the devlist, because I think everyone here is
>> missing some conversation (or maybe it's just your private opinion that
>> you never shared with anyone, but maybe it's worth). I would be
>> interested to hear how the feature that was absolutely most successful
>> feature of airflow 2 was a mistake. According to the 2024 survey
>> https://airflow.apache.org/blog/airflow-survey-2024/  - 48% of Airflow
>> users have been using it, even if it was added as one of the last
>> big features of Airflow 2. It's the MOST used feature out of all the
>> features out there. I would be really curious to see how it was a mistake
>> (but  please start a separate thread explaining why you think it was a
>> mistake, what are your data points and what do you think should be fixed.
>> Just dropping "virtual assets were a mistake" in the middle of multi-team
>> conversation seems completely unjustified without knowing what you are
>> talking about. So I think, until we know more, this argument has no base.
>>
>>
>>>
>>> S
>>> To restate my points:
>>>
>>> - Sharing a deployment between teams today/in 3.1 is operationally more
>>> complex (both scaling, and upgrades) — this is a con, not a plus.
>>>
>>
>> Surely. But it will be easier when AIP-72 is complete (which I am
>> definitely looking forward to and as clearly explained in AIP-82, is a
>> prerequisite of it). Nothing changed here.
>>
>>
>>> - The main user benefit appears to be “allow teams’ DAGs to communicate
>>> via Assets”, in which case we can do that today by putting more work in to
>>> AIP-82’s Asset triggers
>>>
>>
>> No. Lower operational complexity for multi-teams (providing that we
>> deliver AIP-72) is another benefit. Virtual assets is another, and since
>> there is no ground in "virtual assets is a mistake" statement (not until
>> you explain what you mean by that in a separate discussion) - this is also
>> still a very valid point.
>>
>>
>>> Soon, we will have then be asked about cross-team governance, policy
>>> enforcement, and potentially unbounded edge cases (e.g., team-specific
>>> secrets, roles, quotas). ain, you get this for free with truely separate
>>> deployments already
>>> allow different teams to use different executors (including multiple
>>> executors per-team following AIP-61)
>>>
>>
>> Not really. We very explicitly say in the AIP that his is not a goal and
>> that we have no plans for. And yes, using separate executors per team is
>> actually back in the AIP-82 in case you did not notice (and the code needed
>> for it's even implemented and merged already in main by Vincent).
>>
>>
>>> Provably not true right now, and until ~3.2 delivers the full Task
>>> SDK/Core dependency separation this would be _more_ work to upgrade, not
>>> less, and that work is not shared but still on a central team.
>>>
>>
>> Absolutely - we will wait for AIP-72 completion. I do not want to say 3.1
>> or 3.2 directly - because there are - as you said - a lot of moving pieces.
>> So my target for multi-team is "After AIP-72 is completed". Full stop. But
>> there is nothing wrong with accepting the AIP now and doing preparatory
>> work in parallel. Similarly as there is no way to have a baby in 1 month by
>> 9 women, there is no way adding more effort to task-sdk isolation will
>> speed it up - we alredy have not only 3 people (you leading it, Kaxil and
>> Amog) but also all the help from me and even 10s of different contributors
>> (for example with the recent db_test cleanup that I took leadership on) -
>> and there are people who wish to work on adding multi-team features. Since
>> the design heavily limits impact on airflow codebase and interactions with
>> task-sdk implementation, there is nothing wrong with starting
>> implementation in parallel either- amazon team is keen to move it forward -
>> they even already implemented SQS trigger for assets, and we are working
>> together on FAB removal, Keycloak authentication manager - and they seem to
>> still have capacity and drive to progress multi-team. So I am not sure if
>> we are trading off something. There is no "if we work on more on task sdk
>> and drop multi-team things will be faster". Generally in open source people
>> work in the area where they feel they can provide best value - such as you
>> working on task-sdk, me on CI,dev env, they will deliver more value on
>> multi-team
>>
>>
>>>
>>> So please, as succinctly as possible, please tell me what the direct
>>> benefit to users this proposal is over us putting this effort in to writing
>>> better Asset triggers instead?
>>>
>>
>>
>> * less operational overhead for managing multi-team (once AIP-72 is
>> complete) where separate execution environments are important
>> * virtual assets sharing
>> * ability of having "admin" and "team sharing" capability where dags from
>> multiple teams can be seen in a single Airflow UI  (requires custom RBAC)
>>
>> None of this can be done via beter asset triggers
>>
>>
>>>
>>> > On 23 Jun 2025, at 10:57, Jarek Potiuk <ja...@potiuk.com> wrote:
>>> >
>>> > My counter-points:
>>> >
>>> >
>>> >> 1. Managing a multi team deployment is not materially different from
>>> >> managing a deployment per team
>>> >>
>>> >
>>> > It's a bit easier - especially when it comes to upgrades (especially
>>> in the
>>> > case we are targetting when we are not targetting multi-tenant, but
>>> several
>>> > relatively closely cooperating teams with different dependncy
>>> requiremens
>>> > and isolation need.
>>> >
>>> > 2. The database changes were quite wide-reaching
>>> >>
>>> >
>>> > Yes. that is addressed.
>>> >
>>> >
>>> >> 3. I don’t believe the original AIP (again, I haven’t read the updated
>>> >> proposal or recent messages on the thread. yet) will meet what many
>>> users
>>> >> want out of a multiteam solution
>>> >>
>>> >
>>> > I think we will only see when we try. A lot of people thing they would,
>>> > even if they are warned. I know at least one user (Wealthsimple) who
>>> > definitely want to use it and they got a very detailed explanation of
>>> the
>>> > idea and understand it well. So I am sure that **some** users would.
>>> But we
>>> > do not know how many.
>>> >
>>> >
>>> >> To expand on those points a bit more
>>> >>
>>> >> On 1. The only components that are shared are, I think, the scheduler
>>> and
>>> >> the API server, and it’s arguable if that is actually a good idea
>>> given
>>> >> those are likely to be the most performance sensitive components
>>> anyway.
>>> >>
>>> >> Additionally the fact that the scheduler is a shared component makes
>>> >> upgrading it almost a non starter as you would likely need buy-in,
>>> changes,
>>> >> and testing form ALL teams using it. I’d argue that this is a huge
>>> negative
>>> >> until we finish off the version indepence work of AIP-72.
>>> >>
>>> >
>>> > Quite disagree here - especially that our target is that task-sdk is
>>> > supposed to provide all isolation that is needed. There should be 0
>>> changes
>>> > in the dags needed to upgrade scheduler, api_server, triggerer -
>>> precisely
>>> > because we introduced backwards-compatible task-sdk.
>>> >
>>> > On 3 my complaint is essentially that this doesn’t go nearly far
>>> enough. It
>>> >> doesn’t allow read only views to other teams dags. I don’t think it
>>> allows
>>> >> you to be in multiple teams at once. You can’t share a connection
>>> between
>>> >> teams but only allow certain specified dags to access it, but would
>>> have to
>>> >> either be globally usable, or duplicated-and-kept-in-sync between
>>> teams. In
>>> >> short I think it fall short of being useful..
>>> >>
>>> >
>>> > Oh absolutely all that is possible (except sharing single connections
>>> > between multiple teams - which is a very niche use cases and
>>> duplication
>>> > here is perfectly ok as first approximation - and if we need more we
>>> can
>>> > add it later).
>>> >
>>> > Auth manager RBAC and access is abstracted away, and the Keyclock
>>> Manager
>>> > implemented by Vincent allows to manage completely independent and
>>> separate
>>> > RBAC based on arguments and resources provided by Airflow. There is
>>> nothing
>>> > to prevent the user who configures KeyCloak RBAC to define it in the
>>> way:
>>> >
>>> > if group a > allow to read a and write b
>>> > if group b > alllow to write b but not a
>>> >
>>> > and any other combinations. KeyCloak implementation - pretty advanced
>>> > already - (and design of auth manager) completely abstracts away both
>>> > authentication and authorization to KeyCloak and KeyCloak has RBAC
>>> > management built in. Also any of the users can write their own - even
>>> > hard-coded authentication manager to do the same if they do not want to
>>> > have configurable KeyCloak. Even SimpleAuthManager could be hard-coded
>>> to
>>> > provide thiose features.
>>> >
>>> >
>>> >>
>>> >> So on the surface, I’m no more in favour of using dag bundle as a
>>> >> replacement for team id as I think most of the above points still
>>> stand.
>>> >>
>>> >
>>> > We disagree here.
>>> >
>>> >>
>>> >> My counter proposal: We do _nothing_ to core airflow. We work on
>>> improving
>>> >> the event-based trigger o fdags (write more triggers for read/check
>>> remote
>>> >> Assets etc) so that teams can have 100% isolated deployments but still
>>> >> trigger dags based on asset events from other teams.
>>> >>
>>> >
>>> > That does not solve any of the other design goals - only allows to
>>> trigger
>>> > assets a bit more easily (but also it's not entirely solved by AIP-82
>>> > because it does not solve virtual assets - only ones that have defined
>>> > triggerer and "something" to listen on - which is way more complex than
>>> > just defining asset in a Dag and using it in another). I think we can't
>>> > compare AIP-82 to sharing virtual assets due to complexity of it. I
>>> > explained it in the doc.
>>> >
>>> >
>>> > I will now go and catch up with the long thread and updated proposal
>>> and
>>> >> come back.
>>> >>
>>> >
>>> > Please. I hope the above explaination will help in better
>>> understanding of
>>> > the proposal, because I think you had some assumptions that do not
>>> hold any
>>> > more with the new proposal.
>>> >
>>> > J.
>>> >
>>> >
>>> >>
>>> >>> On 23 Jun 2025, at 05:54, Jarek Potiuk <ja...@potiuk.com> wrote:
>>> >>>
>>> >>> Just to clarify the relation - I updated the AIP now to refer to
>>> AIP-82
>>> >> and
>>> >>> to explain relation between the "cross-team" and "cross-airflow"
>>> asset
>>> >>> triggering - this is what I added:
>>> >>>
>>> >>> Note that there is a relation between AIP-82 ("External Driven
>>> >> Scheduling")
>>> >>> and this part of the functionality. When you have multiple instances
>>> of
>>> >>> Airflow, you can use shared datasets - "Physical datasets" - that
>>> several
>>> >>> Airflow Instances can use - for example there could be an S3 object
>>> that
>>> >> is
>>> >>> produced by one airflow instance, and consumed by another. That
>>> requires
>>> >>> deferred trigger to monitor for such datasets, and appropriate
>>> >> permissions
>>> >>> to the external dataset, and you could achive similar result to
>>> >> cross-team
>>> >>> dataset triggering (but cross airflow). However the feature of
>>> sharing
>>> >>> datasets between the teams also works for virtual assets, that do not
>>> >> have
>>> >>> physically shared "objects" and trigger that is monitoring for
>>> changes in
>>> >>> such asset.
>>> >>>
>>> >>> J.
>>> >>>
>>> >>>
>>> >>> On Mon, Jun 23, 2025 at 6:38 AM Jarek Potiuk <ja...@potiuk.com>
>>> wrote:
>>> >>>
>>> >>>>> From a quick glance, the updated AIP didn't seem to have any
>>> reference
>>> >> to
>>> >>>>> AIP-82, which surprised me, but will take a more detailed read
>>> through.
>>> >>>>
>>> >>>> Yep. It did not - because I did not think it was needed or even very
>>> >>>> important after the simplifications. AIP-82 has a different scope,
>>> >> really.
>>> >>>> It only helps when the Assets are "real" data files which we have
>>> >> physical
>>> >>>> triggers for, it's slightly related - sharing datasets between teams
>>> >>>> (including those that do not require physical files and triggers) is
>>> >> still
>>> >>>> possible in the design we have now, but it's not (and never was) the
>>> >>>> **only** reason for having multi-team. There always was (and still
>>> is)
>>> >> the
>>> >>>> possibility of having a common, distinct environments (i.e.
>>> dependencies
>>> >>>> and providers) per team, the possibility of having connections and
>>> >>>> variables that are only accessible to one team and not the other,
>>> and
>>> >>>> isolating workload execution (all that while allowing to manage
>>> multiple
>>> >>>> team and schedule things with single deployment). That did not
>>> change.
>>> >> What
>>> >>>> changed a lot is that it is now way simpler, something that we can
>>> >>>> implement without heavy changes to the codebase - and give it to our
>>> >> users,
>>> >>>> so that they can assess if this is something they need without too
>>> much
>>> >>>> risk and effort.
>>> >>>>
>>> >>>> This was - I believe the main concern, that the value we get from
>>> it is
>>> >>>> not dramatic, but the required changes are huge. This "redesign"
>>> changes
>>> >>>> the equation - the value is still unchanged, but the cost of
>>> >> implementing
>>> >>>> it and impact on the Airflow codebase is much smaller. I still have
>>> not
>>> >>>> heard back from Ash if my proposal responds to his original concern
>>> >> though,
>>> >>>> so I am mostly guessing (also based on the positive impact of
>>> others)
>>> >> that
>>> >>>> yes it does. But to be honest I am not sure and I would love to hear
>>> >> back,
>>> >>>> I decided to update the AIP to reflect it - regardless, because I
>>> think
>>> >> the
>>> >>>> simplification I proposed keeps the original goals, but is indeed
>>> way
>>> >>>> simpler.
>>> >>>>
>>> >>>>> This is a very difficult thread to catch up on.
>>> >>>>
>>> >>>> Valid point. Let me summarize what is the result:
>>> >>>>
>>> >>>> * I significantly simplified the implementation proposal comparing
>>> to
>>> >> the
>>> >>>> original version
>>> >>>> * main simplification is very limited impact on existing database -
>>> >>>> without "ripple effect" that would require us to change a lot of
>>> tables,
>>> >>>> including their primary keys, and heavily impact the UI
>>> >>>> * this is now more of an incremental change that can be implemented
>>> way
>>> >>>> faster and with far less risk
>>> >>>> * updated idea is based on leveraging bundles (already part of our
>>> data
>>> >>>> model) to map them (many-to-one) to a team - which requires to just
>>> >> extend
>>> >>>> the data model with bundle mapping and add team_id to connections
>>> and
>>> >>>> variables. Those are all needed DB changes.
>>> >>>>
>>> >>>> The AIP is updated - in a one single big change so It should be
>>> easy to
>>> >>>> compare the changes:
>>> >>>>
>>> >>
>>> https://cwiki.apache.org/confluence/pages/viewpreviousversions.action?pageId=294816378
>>> >>>> -> I even named the version appropriately "Simplified multi-team
>>> AIP" -
>>> >> you
>>> >>>> can select and compare v.65 with v.66 to see the exact differences I
>>> >>>> proposed.
>>> >>>>
>>> >>>> I hope it will be helpful to catch up and for those who did not
>>> follow,
>>> >> to
>>> >>>> be able to make up their minds about it.
>>> >>>>
>>> >>>> J.
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> On Mon, Jun 23, 2025 at 4:35 AM Vikram Koka
>>> >> <vik...@astronomer.io.invalid>
>>> >>>> wrote:
>>> >>>>
>>> >>>>> This is a very difficult thread to catch up on.
>>> >>>>> I will take a detailed look at the AIP update to try to figure out
>>> the
>>> >>>>> changes in the proposal.
>>> >>>>>
>>> >>>>> From a quick glance, the updated AIP didn't seem to have any
>>> reference
>>> >> to
>>> >>>>> AIP-82, which surprised me, but will take a more detailed read
>>> through.
>>> >>>>>
>>> >>>>> Vikram
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> On Sun, Jun 22, 2025 at 1:44 AM Pavankumar Gopidesu <
>>> >>>>> gopidesupa...@gmail.com>
>>> >>>>> wrote:
>>> >>>>>
>>> >>>>>> Thanks Jarek, that's a great update on this AIP, now it's much
>>> more
>>> >> slim
>>> >>>>>> down.
>>> >>>>>>
>>> >>>>>> left a minor comment. :) Overall looking great.
>>> >>>>>>
>>> >>>>>> Pavan
>>> >>>>>>
>>> >>>>>> On Sat, Jun 21, 2025 at 3:10 PM Jens Scheffler
>>> >>>>> <j_scheff...@gmx.de.invalid
>>> >>>>>>>
>>> >>>>>> wrote:
>>> >>>>>>
>>> >>>>>>> Thanks for the rework/update of the AIP-72!
>>> >>>>>>>
>>> >>>>>>> Just a few small comments but overall I like it as it is much
>>> leaner
>>> >>>>>>> than originally planned and is in a level of complexity that it
>>> >> really
>>> >>>>>>> seems to be a benefit to close the gap as described.
>>> >>>>>>>
>>> >>>>>>> On 21.06.25 14:52, Jarek Potiuk wrote:
>>> >>>>>>>> I updated the AIP - including architecture images and reviewed
>>> it
>>> >>>>>> (again)
>>> >>>>>>>> and corrected any ambiguities and places where it needed to be
>>> >>>>> changed.
>>> >>>>>>>>
>>> >>>>>>>> I think the current state
>>> >>>>>>>>
>>> >>>>>>>
>>> >>>>>>
>>> >>>>>
>>> >>
>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-67+Multi-team+deployment+of+Airflow+components
>>> >>>>>>>> - nicely describes the proposal.
>>> >>>>>>>>
>>> >>>>>>>> Comparing to the previous one:
>>> >>>>>>>>
>>> >>>>>>>> 1. The DB changes are far less intrusive - no ripple effect on
>>> >>>>> Airflow
>>> >>>>>>>> 2. There is no need to merge configurations and provide
>>> different
>>> >>>>> set
>>> >>>>>> of
>>> >>>>>>>> configs per team - we can add it later but I do not see why we
>>> need
>>> >>>>> it
>>> >>>>>> in
>>> >>>>>>>> this simplified version
>>> >>>>>>>> 3. We can still configure a different set of executors per team
>>> -
>>> >>>>> that
>>> >>>>>> is
>>> >>>>>>>> already implemented (we just need to wire it to the bundle ->
>>> team
>>> >>>>>>> mapping).
>>> >>>>>>>>
>>> >>>>>>>> I think it will be way simpler and faster to implement this way
>>> and
>>> >>>>> it
>>> >>>>>>>> should serve as MVMT -> Minimum Viable Multi Team that we can
>>> give
>>> >>>>> our
>>> >>>>>>>> users so that they can provide feedback.
>>> >>>>>>>>
>>> >>>>>>>> J.
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>> On Fri, Jun 20, 2025 at 8:33 AM Jarek Potiuk <ja...@potiuk.com>
>>> >>>>> wrote:
>>> >>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>>> I like this iteration a bit more now for sure, thanks for
>>> being
>>> >>>>>>> receptive
>>> >>>>>>>>>> to feedback! :)
>>> >>>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>>> This now becomes quite close to what was proposing before, we
>>> now
>>> >>>>>> again
>>> >>>>>>>>>> have a team ID (which I think is really needed here, glad to
>>> see
>>> >>>>> it
>>> >>>>>>> back)
>>> >>>>>>>>>> and it will be used for auth management, configuration
>>> >>>>> specification,
>>> >>>>>>> etc
>>> >>>>>>>>>> but will be carried by Bundle instead of the dag model. Which
>>> as
>>> >>>>> you
>>> >>>>>>> say
>>> >>>>>>>>>> “For that we will need to make sure that both api-server,
>>> >>>>> scheduler
>>> >>>>>> and
>>> >>>>>>>>>> triggerer have access to the "bundle definition" (to perform
>>> the
>>> >>>>>>> mapping)"
>>> >>>>>>>>>> which honestly doesn’t feel too much different from the
>>> original
>>> >>>>>>> proposal
>>> >>>>>>>>>> we had last week of adding it to Dag table and ensuring it’s
>>> >>>>>> available
>>> >>>>>>>>>> everywhere. but either way I’m happy to meet in the middle and
>>> >>>>> keep
>>> >>>>>> it
>>> >>>>>>> on
>>> >>>>>>>>>> Bundle if everyone else feels that’s a more suitable location.
>>> >>>>>>>>>>
>>> >>>>>>>>> I think the big difference is the "ripple effect" that was
>>> >>>>> discussed
>>> >>>>>> in
>>> >>>>>>>>>
>>> https://lists.apache.org/thread/78vndnybgpp705j6sm77l1t6xbrtnt5c
>>> >>>>>> (and I
>>> >>>>>>>>> believe - correct me if I am wrong Ash - important trigger for
>>> the
>>> >>>>>>>>> discussion) so far what we wanted is to extend the primary key
>>> and
>>> >>>>> it
>>> >>>>>>> would
>>> >>>>>>>>> ripple through all the pieces of Airflow -> models, API, UI
>>> etc.
>>> >>>>> ...
>>> >>>>>>>>> However - we already have `bundle_name" and "bundle_version"
>>> in the
>>> >>>>>> Dag
>>> >>>>>>>>> model. So I think when we add a separate table where we map the
>>> >>>>> bundle
>>> >>>>>>> to
>>> >>>>>>>>> the team, the "ripple effect" will be almost 0. We do not want
>>> to
>>> >>>>>> change
>>> >>>>>>>>> primary key, we do not want to change UI in any way (except
>>> >>>>> filtering
>>> >>>>>> of
>>> >>>>>>>>> DAGs available based on your team - but that will be handled in
>>> >>>>> Auth
>>> >>>>>>>>> Manager and will not impact UI in any way, I think that's a
>>> huge
>>> >>>>>>>>> simplification of the implementation, and if we agree to it - i
>>> >>>>> think
>>> >>>>>> it
>>> >>>>>>>>> should speed up the implementation significantly. There are
>>> only a
>>> >>>>>>> limited
>>> >>>>>>>>> number of times where you need to look up the team_id - so
>>> having
>>> >>>>> the
>>> >>>>>>>>> bundle -> team mapping in a separate table and having to look
>>> them
>>> >>>>> up
>>> >>>>>>>>> should not be a problem. And it has much less complexity and
>>> >>>>>>>>> "ripple-effect" through the codebase (for example I could
>>> imagine
>>> >>>>> 100s
>>> >>>>>>> or
>>> >>>>>>>>> thousands already written tests that would have to be adapted
>>> if we
>>> >>>>>>> changed
>>> >>>>>>>>> the primary key - where there will be pretty much zero impact
>>> on
>>> >>>>>>> existing
>>> >>>>>>>>> tests if we just add bundle -> team lookup table.
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>>> One other thing I’d point out is that I think including
>>> executors
>>> >>>>> per
>>> >>>>>>>>>> team is a very easy win and quite possible without much work.
>>> I
>>> >>>>>> already
>>> >>>>>>>>>> have much of the code written. Executors are already aware of
>>> >>>>> Teams
>>> >>>>>>> that
>>> >>>>>>>>>> own them (merged), I have a PR open to have configuration per
>>> team
>>> >>>>>>> (with a
>>> >>>>>>>>>> quite simple and isolated approach, I believe you approved
>>> Jarek).
>>> >>>>>> The
>>> >>>>>>> last
>>> >>>>>>>>>> piece is updating the scheduling logic to route tasks from a
>>> >>>>>> particular
>>> >>>>>>>>>> Bundle to the correct executor, which shouldn’t be much work
>>> >>>>> (though
>>> >>>>>> it
>>> >>>>>>>>>> would be easier if the Task models had a column for the team
>>> they
>>> >>>>>>> belong
>>> >>>>>>>>>> to, rather than having to look up the Dag and Bundle to get
>>> the
>>> >>>>>> team) I
>>> >>>>>>>>>> have a branch where I was experimenting with this logic
>>> already.
>>> >>>>>>>>>> Any who, long story short, I don’t think we necessarily need
>>> to
>>> >>>>>> remove
>>> >>>>>>>>>> this piece from the project's scope if it is already partly
>>> done
>>> >>>>> and
>>> >>>>>>> not
>>> >>>>>>>>>> too difficult.
>>> >>>>>>>>>>
>>> >>>>>>>>> Yeah. I hear you here again. Certainly I would not want to just
>>> >>>>>>>>> **remove** it from the code. And, yep I totally forgot we have
>>> it
>>> >>>>> in.
>>> >>>>>>> And
>>> >>>>>>>>> if we can make it in, easily (which it seems we can) - we can
>>> also
>>> >>>>>>> include
>>> >>>>>>>>> it in the first iteration. What I wanted to avoid really (from
>>> the
>>> >>>>>>> original
>>> >>>>>>>>> design) - again trying to simplify it, limit the changes, and
>>> >>>>> speed up
>>> >>>>>>>>> implementation. And there is one "complexity" that I wanted to
>>> >>>>> avoid
>>> >>>>>>>>> specifically - having to have separate , additional
>>> configuration
>>> >>>>> per
>>> >>>>>>> team.
>>> >>>>>>>>> Not only because it complicates already complex configuration
>>> >>>>> handling
>>> >>>>>>> (I
>>> >>>>>>>>> know we have PR for that) but mostly because if it is not
>>> needed,
>>> >>>>> we
>>> >>>>>> can
>>> >>>>>>>>> simplify documentation and explain to our users easier what
>>> they
>>> >>>>> need
>>> >>>>>>> to do
>>> >>>>>>>>> to have their own multi-team setup. And I am quite open to
>>> keeping
>>> >>>>>>>>> multiple-executors if we can avoid complicating configuration.
>>> >>>>>>>>>
>>> >>>>>>>>> But I think some details of that and whether we really need
>>> >>>>> separate
>>> >>>>>>>>> configuration might also come as a result of updating the AIP
>>> - I
>>> >>>>> am
>>> >>>>>> not
>>> >>>>>>>>> quite sure now if we need it, but we can discuss it when we
>>> >>>>> iterate on
>>> >>>>>>> the
>>> >>>>>>>>> AIP.
>>> >>>>>>>>>
>>> >>>>>>>>> J.
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> ---------------------------------------------------------------------
>>> >>>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>> >>>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>
>>> >>>>>
>>> >>>>
>>> >>
>>> >>
>>> >> ---------------------------------------------------------------------
>>> >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>> >> For additional commands, e-mail: dev-h...@airflow.apache.org
>>>
>>>

Re: Discuss: AIP-67 (multi team) now that AIP-82 (External event driven dags) exists

Reply via email to