Re: Discuss: AIP-67 (multi team) now that AIP-82 (External event driven dags) exists

Jarek Potiuk Fri, 27 Jun 2025 03:42:14 -0700

I'd really love to finalise discussion and put it up to a vote some time
after the recording from the last dev call is posted - so that more
context, details and the LONG discussion we had on it. There is no *huge*
hurry  - we have strong dependency on Task Isolation and it seems that it
will still take a bit of time to complete, so I'd say I would love to start
voting in about a week time - so that maybe at the next dev call we can
"seal" the subject. Happy to see any more comments - especially from those
who have opinions but they had no opportunity to express them.


I am personally very happy with the direction it took - simplification and
"MVP" kind of approach - also I invite the stakeholders of ours to take a
close look at the scope and what we really propose - I have a feeling that
we can balance it out - there is something we can make to make it not
"worse" for the offerings they have. I think we have a really good
symbiotic relationship here, and I would love to leverage that. For one -
my goal here is to have a minimum number of changes that are impacting
maintainability of the open-source airflow - but mostly "opening up some
possibilities" - rather than provide turn-key solutions. And mostly because
this is good for all sides - less maintenance and complexity for OSS
maintainers, but more opportunities to make it into "turn-key" solutions by
the stakeholders, while also allowing the "on-prem" users - if they are
highly motivated - to use those features by adding the "turn-key" layer on
their own. Also adding multi-team should not be at the expense of "simple"
installations - they should be virtually unaffected.

One example of applying this is cutting on "separate config files". I think
it moves us closer to a "turn-key" solution but it is not really necessary
to achieve the three goals above - that's why in the current proposal this
part is completely removed - Sorry Niko, but I still think it's one of the
things that falls into this bucket. We can easily remove it, they
complicate code, documentation and options the users have, and even if it
is a "little" more complex to manage configuration by motivated users, it's
also an opportunity for "turn-key" option that stakeholders can build in
their products - and we do not have to maintain it in the open-source. So I
would be rather strong on **not** touching the current configuration and
simply adding configuration for per-team executors in executor config -
even if it is uglier and more "low-level".

So if there are some constructive ideas on what can be done to make it
"simpler" and less "turn-key" in that respect - I would highly value such
ideas and comments. If we can cut down something more that is not
"necessary" for the three primary goals I came up with - I am more than
happy to do it.

Just to remind - those are the "extracted" goals. I slightly updated them
and added to the preamble of the AIP:

* less operational overhead for managing multi-team (once AIP-72 is
complete) where separate execution environments are important
* virtual assets sharing between teams
* ability of having "admin" and "team sharing" capability where dags from
multiple teams can be seen in a single Airflow UI  (requires custom RBAC an
AIP-56 implementation of Auth Manager - with KeyCloak Auth Manager being a
reference implementation)

J.


On Thu, Jun 26, 2025 at 10:53 AM Jarek Potiuk <ja...@potiuk.com> wrote:

>
>> One technical observation: Now that the dag table no longer has a team_id
>> in it, what would the behaviour be when a DAG is attempted to move between
>> bundles? How do we detect this? (I’m not all convinced that we correctly
>> detect duplicate dag ids across bundles today, so I wouldn’t assume or rely
>> on the current behaviour.)
>>
>
> Of course - yes, I realise that - that problem was also not handled in the
> previous iteration to be honest. That is something that dag bundle solution
> allows to solve eventually - but I do not think it's a blocker for the
> proposed implementation. We will have to eventually add some way of
> blocking dags to jump between bundles, we might tackle this separately. I
> already wanted to propose a separate update to that - but I did not want to
> complicate the current proposal. One thing at a time. I can, however - if
> you consider that as a blocker, extend the current AIP with it. Not a big
> problem. This is however a bit independent from the team_id introduction.
>
> Overall, I am still unconvinced this proposal has enough real user benefit
>> over actually separate deployments, and on balance of the added complexity
>> and maintenance burden I do not think it is worth it.
>>
>
> That makes me sad, I thought that over the course of the discussion I
> addressed all the concerns (in this case the concern was "is it worth with
> the cost and little benefit", but when I did it and heavily limited the
> impact, now the concern is "is it worth at all as changes are really
> minimal" - and surely, anyone can change and adapt their concerns, over
> time,  but that one seems like ever-moving target. I hoped at least for
> some acknowledgment of some concerns (complexity in this case) is
> addressed, but it seems that you are deeply convinced that we do not need
> multi-team at all (which is in stark contrast with at least a dozen of
> bigger and smaller users of Airflow who submitted talks to Airflow summit
> (including about 5 or 6 submissions for Airflow 2025) on how they spent
> their engineering effort, time and money on trying to achieve something
> similar - they assessed that it's worth, you  assess that it's not worth.
> Somehow I trust our users that they were not spending the money, time and
> engineering effort to achieve this because they wanted to spend more money.
> I think they assessed it's worth it. So I want to make it a bit easier and
> more "proper" way for them to do that.
>
>>
>> Upgrades: it is not easier to upgrade under this multi team proposal, but
>> much much harder. This is based on hard earned experience from helping
>> Astronomer users — having to coordinate upgrades between multiple teams
>> turns in to a months long slog of the hardest kind of work —  people work:
>> getting other teams to agree to do things that they don’t directly care
>> about — “It’s working for me, I don’t care about upgrading, we’ll get to it
>> next quarter” is a refrain I’ve heard many times.
>>
>
> Yes. absolutely - this is why we deferred it until we knew what shape task
> isolation and other AIPs we depend on take on. Because it is clear that
> pretty much all the problem you explain above are going to be solved with
> task isolation. And it's not just my opinion. If you want to argue with it,
> you likely need to argue with yourself:
> https://github.com/apache/airflow/issues/51545#issuecomment-2980038478.
> Let me quote what you wrote there last week:
>
> Ash Berlin Taylor wrote:
>
> > A tight coupling between task-sdk and any "server side" component is the
> opposite to one of the goals of AIP-72 (I'm not sure we ever explicitly
> said this, but the first point of motivation for the AIP says "Dependency
> conflicts for administrators supporting data teams using different versions
> of providers, libraries, or python packages")
> > In short, my goal with TaskSDK, and the reason for introducing CalVer
> and Cadwyn with the execution API is to end up in a world where you can
> upgrade the Airflow Scheduler/API server interdependently of any worker
> nodes (with the exception that the server must be at least as new as the
> clients)
> > This ability to have version-skew is pretty much non-negotiable to me
> and is (other than other languages) one of primary benefits of AIP-72
>
> If you read yourself from that quote it basically means "it will be easy
> to upgrade airflow independently of workers". So I am a bit confused here.
> Yes, I agree it was difficult, but you yourself explain that when AIP-72
> (which since API-67 has been accepted has always beem prerequisite of it)
> wrote it will be "easy". So I am not sure why you are bringing it now. We
> assume AIP-72 will be completed and this problem will be gone. Let's not
> mention it any more please.
>
> The true separation from TaskSDK will likely only land in about 3.2 time
>> frame. We are actively working on it, but it’s a slow process of untangling
>> lots of assumptions made in the code base over the years. Maybe once we
>> have that my view would be different, but right now I think this makes the
>> proposal a non-starter. Especially as you are saying that most teams will
>> have unique connections. If they’ve got those already, then having an asset
>> trigger use those conns to watch/poll for activity is a much easier
>> solution to operate and crucially, to scale and upgrade.
>>
>
> Yes. I perfectly understand that and I am fully aware of potentially 3.2
> time-frame. And that's fine. Actually I heartily invite you to listen to
> the part of my talk from Berlin Buzzwords when I was asked for the timeline
> - https://youtu.be/EyhZOnbwc-4?t=2226 - this link leads to the exact
> timeline in my talk . My answer was basically - "3.1" or "3.2", and I
> sincerely hope "3.1" but we might not be able to complete it because we
> have other things to do (other - is indeed the Task Isolation work that you
> are leading). And that's perfectly fine. And it absolutely does not prevent
> us from voting on the AIP now - similarly as we voted on the previous
> version of the AIP - knowing that it has some prerequisites a few months
> ago. Especially that we know that the feature we need from task isolation
> is "non-negotiable". I.e. it WILL happen. We don't hope for it, we know it
> will be there. Those are your own words.
>
>
>> >  I think we can’t compare AIP-82 to sharing virtual assets due to
>> complexity of it.
>>
>> Virtual Assets was a mistake, and not how users actually want to use
>> them. Mea culpa
>>
>
> This is the first time I hear this - certainly you never raised this
> concern on the devlist. So if you have some concerns about virtual assets I
> think you should raise it on the devlist, because I think everyone here is
> missing some conversation (or maybe it's just your private opinion that
> you never shared with anyone, but maybe it's worth). I would be
> interested to hear how the feature that was absolutely most successful
> feature of airflow 2 was a mistake. According to the 2024 survey
> https://airflow.apache.org/blog/airflow-survey-2024/  - 48% of Airflow
> users have been using it, even if it was added as one of the last
> big features of Airflow 2. It's the MOST used feature out of all the
> features out there. I would be really curious to see how it was a mistake
> (but  please start a separate thread explaining why you think it was a
> mistake, what are your data points and what do you think should be fixed.
> Just dropping "virtual assets were a mistake" in the middle of multi-team
> conversation seems completely unjustified without knowing what you are
> talking about. So I think, until we know more, this argument has no base.
>
>
>>
>> S
>> To restate my points:
>>
>> - Sharing a deployment between teams today/in 3.1 is operationally more
>> complex (both scaling, and upgrades) — this is a con, not a plus.
>>
>
> Surely. But it will be easier when AIP-72 is complete (which I am
> definitely looking forward to and as clearly explained in AIP-82, is a
> prerequisite of it). Nothing changed here.
>
>
>> - The main user benefit appears to be “allow teams’ DAGs to communicate
>> via Assets”, in which case we can do that today by putting more work in to
>> AIP-82’s Asset triggers
>>
>
> No. Lower operational complexity for multi-teams (providing that we
> deliver AIP-72) is another benefit. Virtual assets is another, and since
> there is no ground in "virtual assets is a mistake" statement (not until
> you explain what you mean by that in a separate discussion) - this is also
> still a very valid point.
>
>
>> Soon, we will have then be asked about cross-team governance, policy
>> enforcement, and potentially unbounded edge cases (e.g., team-specific
>> secrets, roles, quotas). ain, you get this for free with truely separate
>> deployments already
>> allow different teams to use different executors (including multiple
>> executors per-team following AIP-61)
>>
>
> Not really. We very explicitly say in the AIP that his is not a goal and
> that we have no plans for. And yes, using separate executors per team is
> actually back in the AIP-82 in case you did not notice (and the code needed
> for it's even implemented and merged already in main by Vincent).
>
>
>> Provably not true right now, and until ~3.2 delivers the full Task
>> SDK/Core dependency separation this would be _more_ work to upgrade, not
>> less, and that work is not shared but still on a central team.
>>
>
> Absolutely - we will wait for AIP-72 completion. I do not want to say 3.1
> or 3.2 directly - because there are - as you said - a lot of moving pieces.
> So my target for multi-team is "After AIP-72 is completed". Full stop. But
> there is nothing wrong with accepting the AIP now and doing preparatory
> work in parallel. Similarly as there is no way to have a baby in 1 month by
> 9 women, there is no way adding more effort to task-sdk isolation will
> speed it up - we alredy have not only 3 people (you leading it, Kaxil and
> Amog) but also all the help from me and even 10s of different contributors
> (for example with the recent db_test cleanup that I took leadership on) -
> and there are people who wish to work on adding multi-team features. Since
> the design heavily limits impact on airflow codebase and interactions with
> task-sdk implementation, there is nothing wrong with starting
> implementation in parallel either- amazon team is keen to move it forward -
> they even already implemented SQS trigger for assets, and we are working
> together on FAB removal, Keycloak authentication manager - and they seem to
> still have capacity and drive to progress multi-team. So I am not sure if
> we are trading off something. There is no "if we work on more on task sdk
> and drop multi-team things will be faster". Generally in open source people
> work in the area where they feel they can provide best value - such as you
> working on task-sdk, me on CI,dev env, they will deliver more value on
> multi-team
>
>
>>
>> So please, as succinctly as possible, please tell me what the direct
>> benefit to users this proposal is over us putting this effort in to writing
>> better Asset triggers instead?
>>
>
>
> * less operational overhead for managing multi-team (once AIP-72 is
> complete) where separate execution environments are important
> * virtual assets sharing
> * ability of having "admin" and "team sharing" capability where dags from
> multiple teams can be seen in a single Airflow UI  (requires custom RBAC)
>
> None of this can be done via beter asset triggers
>
>
>>
>> > On 23 Jun 2025, at 10:57, Jarek Potiuk <ja...@potiuk.com> wrote:
>> >
>> > My counter-points:
>> >
>> >
>> >> 1. Managing a multi team deployment is not materially different from
>> >> managing a deployment per team
>> >>
>> >
>> > It's a bit easier - especially when it comes to upgrades (especially in
>> the
>> > case we are targetting when we are not targetting multi-tenant, but
>> several
>> > relatively closely cooperating teams with different dependncy
>> requiremens
>> > and isolation need.
>> >
>> > 2. The database changes were quite wide-reaching
>> >>
>> >
>> > Yes. that is addressed.
>> >
>> >
>> >> 3. I don’t believe the original AIP (again, I haven’t read the updated
>> >> proposal or recent messages on the thread. yet) will meet what many
>> users
>> >> want out of a multiteam solution
>> >>
>> >
>> > I think we will only see when we try. A lot of people thing they would,
>> > even if they are warned. I know at least one user (Wealthsimple) who
>> > definitely want to use it and they got a very detailed explanation of
>> the
>> > idea and understand it well. So I am sure that **some** users would.
>> But we
>> > do not know how many.
>> >
>> >
>> >> To expand on those points a bit more
>> >>
>> >> On 1. The only components that are shared are, I think, the scheduler
>> and
>> >> the API server, and it’s arguable if that is actually a good idea given
>> >> those are likely to be the most performance sensitive components
>> anyway.
>> >>
>> >> Additionally the fact that the scheduler is a shared component makes
>> >> upgrading it almost a non starter as you would likely need buy-in,
>> changes,
>> >> and testing form ALL teams using it. I’d argue that this is a huge
>> negative
>> >> until we finish off the version indepence work of AIP-72.
>> >>
>> >
>> > Quite disagree here - especially that our target is that task-sdk is
>> > supposed to provide all isolation that is needed. There should be 0
>> changes
>> > in the dags needed to upgrade scheduler, api_server, triggerer -
>> precisely
>> > because we introduced backwards-compatible task-sdk.
>> >
>> > On 3 my complaint is essentially that this doesn’t go nearly far
>> enough. It
>> >> doesn’t allow read only views to other teams dags. I don’t think it
>> allows
>> >> you to be in multiple teams at once. You can’t share a connection
>> between
>> >> teams but only allow certain specified dags to access it, but would
>> have to
>> >> either be globally usable, or duplicated-and-kept-in-sync between
>> teams. In
>> >> short I think it fall short of being useful..
>> >>
>> >
>> > Oh absolutely all that is possible (except sharing single connections
>> > between multiple teams - which is a very niche use cases and duplication
>> > here is perfectly ok as first approximation - and if we need more we can
>> > add it later).
>> >
>> > Auth manager RBAC and access is abstracted away, and the Keyclock
>> Manager
>> > implemented by Vincent allows to manage completely independent and
>> separate
>> > RBAC based on arguments and resources provided by Airflow. There is
>> nothing
>> > to prevent the user who configures KeyCloak RBAC to define it in the
>> way:
>> >
>> > if group a > allow to read a and write b
>> > if group b > alllow to write b but not a
>> >
>> > and any other combinations. KeyCloak implementation - pretty advanced
>> > already - (and design of auth manager) completely abstracts away both
>> > authentication and authorization to KeyCloak and KeyCloak has RBAC
>> > management built in. Also any of the users can write their own - even
>> > hard-coded authentication manager to do the same if they do not want to
>> > have configurable KeyCloak. Even SimpleAuthManager could be hard-coded
>> to
>> > provide thiose features.
>> >
>> >
>> >>
>> >> So on the surface, I’m no more in favour of using dag bundle as a
>> >> replacement for team id as I think most of the above points still
>> stand.
>> >>
>> >
>> > We disagree here.
>> >
>> >>
>> >> My counter proposal: We do _nothing_ to core airflow. We work on
>> improving
>> >> the event-based trigger o fdags (write more triggers for read/check
>> remote
>> >> Assets etc) so that teams can have 100% isolated deployments but still
>> >> trigger dags based on asset events from other teams.
>> >>
>> >
>> > That does not solve any of the other design goals - only allows to
>> trigger
>> > assets a bit more easily (but also it's not entirely solved by AIP-82
>> > because it does not solve virtual assets - only ones that have defined
>> > triggerer and "something" to listen on - which is way more complex than
>> > just defining asset in a Dag and using it in another). I think we can't
>> > compare AIP-82 to sharing virtual assets due to complexity of it. I
>> > explained it in the doc.
>> >
>> >
>> > I will now go and catch up with the long thread and updated proposal and
>> >> come back.
>> >>
>> >
>> > Please. I hope the above explaination will help in better understanding
>> of
>> > the proposal, because I think you had some assumptions that do not hold
>> any
>> > more with the new proposal.
>> >
>> > J.
>> >
>> >
>> >>
>> >>> On 23 Jun 2025, at 05:54, Jarek Potiuk <ja...@potiuk.com> wrote:
>> >>>
>> >>> Just to clarify the relation - I updated the AIP now to refer to
>> AIP-82
>> >> and
>> >>> to explain relation between the "cross-team" and "cross-airflow" asset
>> >>> triggering - this is what I added:
>> >>>
>> >>> Note that there is a relation between AIP-82 ("External Driven
>> >> Scheduling")
>> >>> and this part of the functionality. When you have multiple instances
>> of
>> >>> Airflow, you can use shared datasets - "Physical datasets" - that
>> several
>> >>> Airflow Instances can use - for example there could be an S3 object
>> that
>> >> is
>> >>> produced by one airflow instance, and consumed by another. That
>> requires
>> >>> deferred trigger to monitor for such datasets, and appropriate
>> >> permissions
>> >>> to the external dataset, and you could achive similar result to
>> >> cross-team
>> >>> dataset triggering (but cross airflow). However the feature of sharing
>> >>> datasets between the teams also works for virtual assets, that do not
>> >> have
>> >>> physically shared "objects" and trigger that is monitoring for
>> changes in
>> >>> such asset.
>> >>>
>> >>> J.
>> >>>
>> >>>
>> >>> On Mon, Jun 23, 2025 at 6:38 AM Jarek Potiuk <ja...@potiuk.com>
>> wrote:
>> >>>
>> >>>>> From a quick glance, the updated AIP didn't seem to have any
>> reference
>> >> to
>> >>>>> AIP-82, which surprised me, but will take a more detailed read
>> through.
>> >>>>
>> >>>> Yep. It did not - because I did not think it was needed or even very
>> >>>> important after the simplifications. AIP-82 has a different scope,
>> >> really.
>> >>>> It only helps when the Assets are "real" data files which we have
>> >> physical
>> >>>> triggers for, it's slightly related - sharing datasets between teams
>> >>>> (including those that do not require physical files and triggers) is
>> >> still
>> >>>> possible in the design we have now, but it's not (and never was) the
>> >>>> **only** reason for having multi-team. There always was (and still
>> is)
>> >> the
>> >>>> possibility of having a common, distinct environments (i.e.
>> dependencies
>> >>>> and providers) per team, the possibility of having connections and
>> >>>> variables that are only accessible to one team and not the other, and
>> >>>> isolating workload execution (all that while allowing to manage
>> multiple
>> >>>> team and schedule things with single deployment). That did not
>> change.
>> >> What
>> >>>> changed a lot is that it is now way simpler, something that we can
>> >>>> implement without heavy changes to the codebase - and give it to our
>> >> users,
>> >>>> so that they can assess if this is something they need without too
>> much
>> >>>> risk and effort.
>> >>>>
>> >>>> This was - I believe the main concern, that the value we get from it
>> is
>> >>>> not dramatic, but the required changes are huge. This "redesign"
>> changes
>> >>>> the equation - the value is still unchanged, but the cost of
>> >> implementing
>> >>>> it and impact on the Airflow codebase is much smaller. I still have
>> not
>> >>>> heard back from Ash if my proposal responds to his original concern
>> >> though,
>> >>>> so I am mostly guessing (also based on the positive impact of others)
>> >> that
>> >>>> yes it does. But to be honest I am not sure and I would love to hear
>> >> back,
>> >>>> I decided to update the AIP to reflect it - regardless, because I
>> think
>> >> the
>> >>>> simplification I proposed keeps the original goals, but is indeed way
>> >>>> simpler.
>> >>>>
>> >>>>> This is a very difficult thread to catch up on.
>> >>>>
>> >>>> Valid point. Let me summarize what is the result:
>> >>>>
>> >>>> * I significantly simplified the implementation proposal comparing to
>> >> the
>> >>>> original version
>> >>>> * main simplification is very limited impact on existing database -
>> >>>> without "ripple effect" that would require us to change a lot of
>> tables,
>> >>>> including their primary keys, and heavily impact the UI
>> >>>> * this is now more of an incremental change that can be implemented
>> way
>> >>>> faster and with far less risk
>> >>>> * updated idea is based on leveraging bundles (already part of our
>> data
>> >>>> model) to map them (many-to-one) to a team - which requires to just
>> >> extend
>> >>>> the data model with bundle mapping and add team_id to connections and
>> >>>> variables. Those are all needed DB changes.
>> >>>>
>> >>>> The AIP is updated - in a one single big change so It should be easy
>> to
>> >>>> compare the changes:
>> >>>>
>> >>
>> https://cwiki.apache.org/confluence/pages/viewpreviousversions.action?pageId=294816378
>> >>>> -> I even named the version appropriately "Simplified multi-team
>> AIP" -
>> >> you
>> >>>> can select and compare v.65 with v.66 to see the exact differences I
>> >>>> proposed.
>> >>>>
>> >>>> I hope it will be helpful to catch up and for those who did not
>> follow,
>> >> to
>> >>>> be able to make up their minds about it.
>> >>>>
>> >>>> J.
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Mon, Jun 23, 2025 at 4:35 AM Vikram Koka
>> >> <vik...@astronomer.io.invalid>
>> >>>> wrote:
>> >>>>
>> >>>>> This is a very difficult thread to catch up on.
>> >>>>> I will take a detailed look at the AIP update to try to figure out
>> the
>> >>>>> changes in the proposal.
>> >>>>>
>> >>>>> From a quick glance, the updated AIP didn't seem to have any
>> reference
>> >> to
>> >>>>> AIP-82, which surprised me, but will take a more detailed read
>> through.
>> >>>>>
>> >>>>> Vikram
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On Sun, Jun 22, 2025 at 1:44 AM Pavankumar Gopidesu <
>> >>>>> gopidesupa...@gmail.com>
>> >>>>> wrote:
>> >>>>>
>> >>>>>> Thanks Jarek, that's a great update on this AIP, now it's much more
>> >> slim
>> >>>>>> down.
>> >>>>>>
>> >>>>>> left a minor comment. :) Overall looking great.
>> >>>>>>
>> >>>>>> Pavan
>> >>>>>>
>> >>>>>> On Sat, Jun 21, 2025 at 3:10 PM Jens Scheffler
>> >>>>> <j_scheff...@gmx.de.invalid
>> >>>>>>>
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>>> Thanks for the rework/update of the AIP-72!
>> >>>>>>>
>> >>>>>>> Just a few small comments but overall I like it as it is much
>> leaner
>> >>>>>>> than originally planned and is in a level of complexity that it
>> >> really
>> >>>>>>> seems to be a benefit to close the gap as described.
>> >>>>>>>
>> >>>>>>> On 21.06.25 14:52, Jarek Potiuk wrote:
>> >>>>>>>> I updated the AIP - including architecture images and reviewed it
>> >>>>>> (again)
>> >>>>>>>> and corrected any ambiguities and places where it needed to be
>> >>>>> changed.
>> >>>>>>>>
>> >>>>>>>> I think the current state
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>
>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-67+Multi-team+deployment+of+Airflow+components
>> >>>>>>>> - nicely describes the proposal.
>> >>>>>>>>
>> >>>>>>>> Comparing to the previous one:
>> >>>>>>>>
>> >>>>>>>> 1. The DB changes are far less intrusive - no ripple effect on
>> >>>>> Airflow
>> >>>>>>>> 2. There is no need to merge configurations and provide different
>> >>>>> set
>> >>>>>> of
>> >>>>>>>> configs per team - we can add it later but I do not see why we
>> need
>> >>>>> it
>> >>>>>> in
>> >>>>>>>> this simplified version
>> >>>>>>>> 3. We can still configure a different set of executors per team -
>> >>>>> that
>> >>>>>> is
>> >>>>>>>> already implemented (we just need to wire it to the bundle ->
>> team
>> >>>>>>> mapping).
>> >>>>>>>>
>> >>>>>>>> I think it will be way simpler and faster to implement this way
>> and
>> >>>>> it
>> >>>>>>>> should serve as MVMT -> Minimum Viable Multi Team that we can
>> give
>> >>>>> our
>> >>>>>>>> users so that they can provide feedback.
>> >>>>>>>>
>> >>>>>>>> J.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> On Fri, Jun 20, 2025 at 8:33 AM Jarek Potiuk <ja...@potiuk.com>
>> >>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>> I like this iteration a bit more now for sure, thanks for being
>> >>>>>>> receptive
>> >>>>>>>>>> to feedback! :)
>> >>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>> This now becomes quite close to what was proposing before, we
>> now
>> >>>>>> again
>> >>>>>>>>>> have a team ID (which I think is really needed here, glad to
>> see
>> >>>>> it
>> >>>>>>> back)
>> >>>>>>>>>> and it will be used for auth management, configuration
>> >>>>> specification,
>> >>>>>>> etc
>> >>>>>>>>>> but will be carried by Bundle instead of the dag model. Which
>> as
>> >>>>> you
>> >>>>>>> say
>> >>>>>>>>>> “For that we will need to make sure that both api-server,
>> >>>>> scheduler
>> >>>>>> and
>> >>>>>>>>>> triggerer have access to the "bundle definition" (to perform
>> the
>> >>>>>>> mapping)"
>> >>>>>>>>>> which honestly doesn’t feel too much different from the
>> original
>> >>>>>>> proposal
>> >>>>>>>>>> we had last week of adding it to Dag table and ensuring it’s
>> >>>>>> available
>> >>>>>>>>>> everywhere. but either way I’m happy to meet in the middle and
>> >>>>> keep
>> >>>>>> it
>> >>>>>>> on
>> >>>>>>>>>> Bundle if everyone else feels that’s a more suitable location.
>> >>>>>>>>>>
>> >>>>>>>>> I think the big difference is the "ripple effect" that was
>> >>>>> discussed
>> >>>>>> in
>> >>>>>>>>>
>> https://lists.apache.org/thread/78vndnybgpp705j6sm77l1t6xbrtnt5c
>> >>>>>> (and I
>> >>>>>>>>> believe - correct me if I am wrong Ash - important trigger for
>> the
>> >>>>>>>>> discussion) so far what we wanted is to extend the primary key
>> and
>> >>>>> it
>> >>>>>>> would
>> >>>>>>>>> ripple through all the pieces of Airflow -> models, API, UI etc.
>> >>>>> ...
>> >>>>>>>>> However - we already have `bundle_name" and "bundle_version" in
>> the
>> >>>>>> Dag
>> >>>>>>>>> model. So I think when we add a separate table where we map the
>> >>>>> bundle
>> >>>>>>> to
>> >>>>>>>>> the team, the "ripple effect" will be almost 0. We do not want
>> to
>> >>>>>> change
>> >>>>>>>>> primary key, we do not want to change UI in any way (except
>> >>>>> filtering
>> >>>>>> of
>> >>>>>>>>> DAGs available based on your team - but that will be handled in
>> >>>>> Auth
>> >>>>>>>>> Manager and will not impact UI in any way, I think that's a huge
>> >>>>>>>>> simplification of the implementation, and if we agree to it - i
>> >>>>> think
>> >>>>>> it
>> >>>>>>>>> should speed up the implementation significantly. There are
>> only a
>> >>>>>>> limited
>> >>>>>>>>> number of times where you need to look up the team_id - so
>> having
>> >>>>> the
>> >>>>>>>>> bundle -> team mapping in a separate table and having to look
>> them
>> >>>>> up
>> >>>>>>>>> should not be a problem. And it has much less complexity and
>> >>>>>>>>> "ripple-effect" through the codebase (for example I could
>> imagine
>> >>>>> 100s
>> >>>>>>> or
>> >>>>>>>>> thousands already written tests that would have to be adapted
>> if we
>> >>>>>>> changed
>> >>>>>>>>> the primary key - where there will be pretty much zero impact on
>> >>>>>>> existing
>> >>>>>>>>> tests if we just add bundle -> team lookup table.
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>> One other thing I’d point out is that I think including
>> executors
>> >>>>> per
>> >>>>>>>>>> team is a very easy win and quite possible without much work. I
>> >>>>>> already
>> >>>>>>>>>> have much of the code written. Executors are already aware of
>> >>>>> Teams
>> >>>>>>> that
>> >>>>>>>>>> own them (merged), I have a PR open to have configuration per
>> team
>> >>>>>>> (with a
>> >>>>>>>>>> quite simple and isolated approach, I believe you approved
>> Jarek).
>> >>>>>> The
>> >>>>>>> last
>> >>>>>>>>>> piece is updating the scheduling logic to route tasks from a
>> >>>>>> particular
>> >>>>>>>>>> Bundle to the correct executor, which shouldn’t be much work
>> >>>>> (though
>> >>>>>> it
>> >>>>>>>>>> would be easier if the Task models had a column for the team
>> they
>> >>>>>>> belong
>> >>>>>>>>>> to, rather than having to look up the Dag and Bundle to get the
>> >>>>>> team) I
>> >>>>>>>>>> have a branch where I was experimenting with this logic
>> already.
>> >>>>>>>>>> Any who, long story short, I don’t think we necessarily need to
>> >>>>>> remove
>> >>>>>>>>>> this piece from the project's scope if it is already partly
>> done
>> >>>>> and
>> >>>>>>> not
>> >>>>>>>>>> too difficult.
>> >>>>>>>>>>
>> >>>>>>>>> Yeah. I hear you here again. Certainly I would not want to just
>> >>>>>>>>> **remove** it from the code. And, yep I totally forgot we have
>> it
>> >>>>> in.
>> >>>>>>> And
>> >>>>>>>>> if we can make it in, easily (which it seems we can) - we can
>> also
>> >>>>>>> include
>> >>>>>>>>> it in the first iteration. What I wanted to avoid really (from
>> the
>> >>>>>>> original
>> >>>>>>>>> design) - again trying to simplify it, limit the changes, and
>> >>>>> speed up
>> >>>>>>>>> implementation. And there is one "complexity" that I wanted to
>> >>>>> avoid
>> >>>>>>>>> specifically - having to have separate , additional
>> configuration
>> >>>>> per
>> >>>>>>> team.
>> >>>>>>>>> Not only because it complicates already complex configuration
>> >>>>> handling
>> >>>>>>> (I
>> >>>>>>>>> know we have PR for that) but mostly because if it is not
>> needed,
>> >>>>> we
>> >>>>>> can
>> >>>>>>>>> simplify documentation and explain to our users easier what they
>> >>>>> need
>> >>>>>>> to do
>> >>>>>>>>> to have their own multi-team setup. And I am quite open to
>> keeping
>> >>>>>>>>> multiple-executors if we can avoid complicating configuration.
>> >>>>>>>>>
>> >>>>>>>>> But I think some details of that and whether we really need
>> >>>>> separate
>> >>>>>>>>> configuration might also come as a result of updating the AIP -
>> I
>> >>>>> am
>> >>>>>> not
>> >>>>>>>>> quite sure now if we need it, but we can discuss it when we
>> >>>>> iterate on
>> >>>>>>> the
>> >>>>>>>>> AIP.
>> >>>>>>>>>
>> >>>>>>>>> J.
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>
>> >>>>>>>
>> ---------------------------------------------------------------------
>> >>>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>> >>>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org
>> >>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>> >> For additional commands, e-mail: dev-h...@airflow.apache.org
>>
>>

Re: Discuss: AIP-67 (multi team) now that AIP-82 (External event driven dags) exists

Reply via email to