Re: Discuss: AIP-67 (multi team) now that AIP-82 (External event driven dags) exists

Jarek Potiuk Mon, 23 Jun 2025 03:21:14 -0700

My counter-points:


> 1. Managing a multi team deployment is not materially different from
> managing a deployment per team
>

It's a bit easier - especially when it comes to upgrades (especially in the
case we are targetting when we are not targetting multi-tenant, but several
relatively closely cooperating teams with different dependncy requiremens
and isolation need.

2. The database changes were quite wide-reaching
>

Yes. that is addressed.


> 3. I don’t believe the original AIP (again, I haven’t read the updated
> proposal or recent messages on the thread. yet) will meet what many users
> want out of a multiteam solution
>

I think we will only see when we try. A lot of people thing they would,
even if they are warned. I know at least one user (Wealthsimple) who
definitely want to use it and they got a very detailed explanation of the
idea and understand it well. So I am sure that **some** users would. But we
do not know how many.


> To expand on those points a bit more
>
> On 1. The only components that are shared are, I think, the scheduler and
> the API server, and it’s arguable if that is actually a good idea given
> those are likely to be the most performance sensitive components anyway.
>
> Additionally the fact that the scheduler is a shared component makes
> upgrading it almost a non starter as you would likely need buy-in, changes,
> and testing form ALL teams using it. I’d argue that this is a huge negative
> until we finish off the version indepence work of AIP-72.
>

Quite disagree here - especially that our target is that task-sdk is
supposed to provide all isolation that is needed. There should be 0 changes
in the dags needed to upgrade scheduler, api_server, triggerer - precisely
because we introduced backwards-compatible task-sdk.

On 3 my complaint is essentially that this doesn’t go nearly far enough. It
> doesn’t allow read only views to other teams dags. I don’t think it allows
> you to be in multiple teams at once. You can’t share a connection between
> teams but only allow certain specified dags to access it, but would have to
> either be globally usable, or duplicated-and-kept-in-sync between teams. In
> short I think it fall short of being useful..
>

Oh absolutely all that is possible (except sharing single connections
between multiple teams - which is a very niche use cases and duplication
here is perfectly ok as first approximation - and if we need more we can
add it later).

Auth manager RBAC and access is abstracted away, and the Keyclock Manager
implemented by Vincent allows to manage completely independent and separate
RBAC based on arguments and resources provided by Airflow. There is nothing
to prevent the user who configures KeyCloak RBAC to define it in the way:

if group a > allow to read a and write b
if group b > alllow to write b but not a

and any other combinations. KeyCloak implementation - pretty advanced
already - (and design of auth manager) completely abstracts away both
authentication and authorization to KeyCloak and KeyCloak has RBAC
management built in. Also any of the users can write their own - even
hard-coded authentication manager to do the same if they do not want to
have configurable KeyCloak. Even SimpleAuthManager could be hard-coded to
provide thiose features.


>
> So on the surface, I’m no more in favour of using dag bundle as a
> replacement for team id as I think most of the above points still stand.
>

We disagree here.

>
> My counter proposal: We do _nothing_ to core airflow. We work on improving
> the event-based trigger o fdags (write more triggers for read/check remote
> Assets etc) so that teams can have 100% isolated deployments but still
> trigger dags based on asset events from other teams.
>

That does not solve any of the other design goals - only allows to trigger
assets a bit more easily (but also it's not entirely solved by AIP-82
because it does not solve virtual assets - only ones that have defined
triggerer and "something" to listen on - which is way more complex than
just defining asset in a Dag and using it in another). I think we can't
compare AIP-82 to sharing virtual assets due to complexity of it. I
explained it in the doc.


I will now go and catch up with the long thread and updated proposal and
> come back.
>

Please. I hope the above explaination will help in better understanding of
the proposal, because I think you had some assumptions that do not hold any
more with the new proposal.

J.


>
> > On 23 Jun 2025, at 05:54, Jarek Potiuk <ja...@potiuk.com> wrote:
> >
> > Just to clarify the relation - I updated the AIP now to refer to AIP-82
> and
> > to explain relation between the "cross-team" and "cross-airflow" asset
> > triggering - this is what I added:
> >
> > Note that there is a relation between AIP-82 ("External Driven
> Scheduling")
> > and this part of the functionality. When you have multiple instances of
> > Airflow, you can use shared datasets - "Physical datasets" - that several
> > Airflow Instances can use - for example there could be an S3 object that
> is
> > produced by one airflow instance, and consumed by another. That requires
> > deferred trigger to monitor for such datasets, and appropriate
> permissions
> > to the external dataset, and you could achive similar result to
> cross-team
> > dataset triggering (but cross airflow). However the feature of sharing
> > datasets between the teams also works for virtual assets, that do not
> have
> > physically shared "objects" and trigger that is monitoring for changes in
> > such asset.
> >
> > J.
> >
> >
> > On Mon, Jun 23, 2025 at 6:38 AM Jarek Potiuk <ja...@potiuk.com> wrote:
> >
> >>> From a quick glance, the updated AIP didn't seem to have any reference
> to
> >>> AIP-82, which surprised me, but will take a more detailed read through.
> >>
> >> Yep. It did not - because I did not think it was needed or even very
> >> important after the simplifications. AIP-82 has a different scope,
> really.
> >> It only helps when the Assets are "real" data files which we have
> physical
> >> triggers for, it's slightly related - sharing datasets between teams
> >> (including those that do not require physical files and triggers) is
> still
> >> possible in the design we have now, but it's not (and never was) the
> >> **only** reason for having multi-team. There always was (and still is)
> the
> >> possibility of having a common, distinct environments (i.e. dependencies
> >> and providers) per team, the possibility of having connections and
> >> variables that are only accessible to one team and not the other, and
> >> isolating workload execution (all that while allowing to manage multiple
> >> team and schedule things with single deployment). That did not change.
> What
> >> changed a lot is that it is now way simpler, something that we can
> >> implement without heavy changes to the codebase - and give it to our
> users,
> >> so that they can assess if this is something they need without too much
> >> risk and effort.
> >>
> >> This was - I believe the main concern, that the value we get from it is
> >> not dramatic, but the required changes are huge. This "redesign" changes
> >> the equation - the value is still unchanged, but the cost of
> implementing
> >> it and impact on the Airflow codebase is much smaller. I still have not
> >> heard back from Ash if my proposal responds to his original concern
> though,
> >> so I am mostly guessing (also based on the positive impact of others)
> that
> >> yes it does. But to be honest I am not sure and I would love to hear
> back,
> >> I decided to update the AIP to reflect it - regardless, because I think
> the
> >> simplification I proposed keeps the original goals, but is indeed way
> >> simpler.
> >>
> >>> This is a very difficult thread to catch up on.
> >>
> >> Valid point. Let me summarize what is the result:
> >>
> >> * I significantly simplified the implementation proposal comparing to
> the
> >> original version
> >> * main simplification is very limited impact on existing database -
> >> without "ripple effect" that would require us to change a lot of tables,
> >> including their primary keys, and heavily impact the UI
> >> * this is now more of an incremental change that can be implemented way
> >> faster and with far less risk
> >> * updated idea is based on leveraging bundles (already part of our data
> >> model) to map them (many-to-one) to a team - which requires to just
> extend
> >> the data model with bundle mapping and add team_id to connections and
> >> variables. Those are all needed DB changes.
> >>
> >> The AIP is updated - in a one single big change so It should be easy to
> >> compare the changes:
> >>
> https://cwiki.apache.org/confluence/pages/viewpreviousversions.action?pageId=294816378
> >> -> I even named the version appropriately "Simplified multi-team AIP" -
> you
> >> can select and compare v.65 with v.66 to see the exact differences I
> >> proposed.
> >>
> >> I hope it will be helpful to catch up and for those who did not follow,
> to
> >> be able to make up their minds about it.
> >>
> >> J.
> >>
> >>
> >>
> >> On Mon, Jun 23, 2025 at 4:35 AM Vikram Koka
> <vik...@astronomer.io.invalid>
> >> wrote:
> >>
> >>> This is a very difficult thread to catch up on.
> >>> I will take a detailed look at the AIP update to try to figure out the
> >>> changes in the proposal.
> >>>
> >>> From a quick glance, the updated AIP didn't seem to have any reference
> to
> >>> AIP-82, which surprised me, but will take a more detailed read through.
> >>>
> >>> Vikram
> >>>
> >>>
> >>>
> >>> On Sun, Jun 22, 2025 at 1:44 AM Pavankumar Gopidesu <
> >>> gopidesupa...@gmail.com>
> >>> wrote:
> >>>
> >>>> Thanks Jarek, that's a great update on this AIP, now it's much more
> slim
> >>>> down.
> >>>>
> >>>> left a minor comment. :) Overall looking great.
> >>>>
> >>>> Pavan
> >>>>
> >>>> On Sat, Jun 21, 2025 at 3:10 PM Jens Scheffler
> >>> <j_scheff...@gmx.de.invalid
> >>>>>
> >>>> wrote:
> >>>>
> >>>>> Thanks for the rework/update of the AIP-72!
> >>>>>
> >>>>> Just a few small comments but overall I like it as it is much leaner
> >>>>> than originally planned and is in a level of complexity that it
> really
> >>>>> seems to be a benefit to close the gap as described.
> >>>>>
> >>>>> On 21.06.25 14:52, Jarek Potiuk wrote:
> >>>>>> I updated the AIP - including architecture images and reviewed it
> >>>> (again)
> >>>>>> and corrected any ambiguities and places where it needed to be
> >>> changed.
> >>>>>>
> >>>>>> I think the current state
> >>>>>>
> >>>>>
> >>>>
> >>>
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-67+Multi-team+deployment+of+Airflow+components
> >>>>>> - nicely describes the proposal.
> >>>>>>
> >>>>>> Comparing to the previous one:
> >>>>>>
> >>>>>> 1. The DB changes are far less intrusive - no ripple effect on
> >>> Airflow
> >>>>>> 2. There is no need to merge configurations and provide different
> >>> set
> >>>> of
> >>>>>> configs per team - we can add it later but I do not see why we need
> >>> it
> >>>> in
> >>>>>> this simplified version
> >>>>>> 3. We can still configure a different set of executors per team -
> >>> that
> >>>> is
> >>>>>> already implemented (we just need to wire it to the bundle -> team
> >>>>> mapping).
> >>>>>>
> >>>>>> I think it will be way simpler and faster to implement this way and
> >>> it
> >>>>>> should serve as MVMT -> Minimum Viable Multi Team that we can give
> >>> our
> >>>>>> users so that they can provide feedback.
> >>>>>>
> >>>>>> J.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Fri, Jun 20, 2025 at 8:33 AM Jarek Potiuk <ja...@potiuk.com>
> >>> wrote:
> >>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> I like this iteration a bit more now for sure, thanks for being
> >>>>> receptive
> >>>>>>>> to feedback! :)
> >>>>>>>>
> >>>>>>>
> >>>>>>>> This now becomes quite close to what was proposing before, we now
> >>>> again
> >>>>>>>> have a team ID (which I think is really needed here, glad to see
> >>> it
> >>>>> back)
> >>>>>>>> and it will be used for auth management, configuration
> >>> specification,
> >>>>> etc
> >>>>>>>> but will be carried by Bundle instead of the dag model. Which as
> >>> you
> >>>>> say
> >>>>>>>> “For that we will need to make sure that both api-server,
> >>> scheduler
> >>>> and
> >>>>>>>> triggerer have access to the "bundle definition" (to perform the
> >>>>> mapping)"
> >>>>>>>> which honestly doesn’t feel too much different from the original
> >>>>> proposal
> >>>>>>>> we had last week of adding it to Dag table and ensuring it’s
> >>>> available
> >>>>>>>> everywhere. but either way I’m happy to meet in the middle and
> >>> keep
> >>>> it
> >>>>> on
> >>>>>>>> Bundle if everyone else feels that’s a more suitable location.
> >>>>>>>>
> >>>>>>> I think the big difference is the "ripple effect" that was
> >>> discussed
> >>>> in
> >>>>>>> https://lists.apache.org/thread/78vndnybgpp705j6sm77l1t6xbrtnt5c
> >>>> (and I
> >>>>>>> believe - correct me if I am wrong Ash - important trigger for the
> >>>>>>> discussion) so far what we wanted is to extend the primary key and
> >>> it
> >>>>> would
> >>>>>>> ripple through all the pieces of Airflow -> models, API, UI etc.
> >>> ...
> >>>>>>> However - we already have `bundle_name" and "bundle_version" in the
> >>>> Dag
> >>>>>>> model. So I think when we add a separate table where we map the
> >>> bundle
> >>>>> to
> >>>>>>> the team, the "ripple effect" will be almost 0. We do not want to
> >>>> change
> >>>>>>> primary key, we do not want to change UI in any way (except
> >>> filtering
> >>>> of
> >>>>>>> DAGs available based on your team - but that will be handled in
> >>> Auth
> >>>>>>> Manager and will not impact UI in any way, I think that's a huge
> >>>>>>> simplification of the implementation, and if we agree to it - i
> >>> think
> >>>> it
> >>>>>>> should speed up the implementation significantly. There are only a
> >>>>> limited
> >>>>>>> number of times where you need to look up the team_id - so having
> >>> the
> >>>>>>> bundle -> team mapping in a separate table and having to look them
> >>> up
> >>>>>>> should not be a problem. And it has much less complexity and
> >>>>>>> "ripple-effect" through the codebase (for example I could imagine
> >>> 100s
> >>>>> or
> >>>>>>> thousands already written tests that would have to be adapted if we
> >>>>> changed
> >>>>>>> the primary key - where there will be pretty much zero impact on
> >>>>> existing
> >>>>>>> tests if we just add bundle -> team lookup table.
> >>>>>>>
> >>>>>>>
> >>>>>>>> One other thing I’d point out is that I think including executors
> >>> per
> >>>>>>>> team is a very easy win and quite possible without much work. I
> >>>> already
> >>>>>>>> have much of the code written. Executors are already aware of
> >>> Teams
> >>>>> that
> >>>>>>>> own them (merged), I have a PR open to have configuration per team
> >>>>> (with a
> >>>>>>>> quite simple and isolated approach, I believe you approved Jarek).
> >>>> The
> >>>>> last
> >>>>>>>> piece is updating the scheduling logic to route tasks from a
> >>>> particular
> >>>>>>>> Bundle to the correct executor, which shouldn’t be much work
> >>> (though
> >>>> it
> >>>>>>>> would be easier if the Task models had a column for the team they
> >>>>> belong
> >>>>>>>> to, rather than having to look up the Dag and Bundle to get the
> >>>> team) I
> >>>>>>>> have a branch where I was experimenting with this logic already.
> >>>>>>>> Any who, long story short, I don’t think we necessarily need to
> >>>> remove
> >>>>>>>> this piece from the project's scope if it is already partly done
> >>> and
> >>>>> not
> >>>>>>>> too difficult.
> >>>>>>>>
> >>>>>>> Yeah. I hear you here again. Certainly I would not want to just
> >>>>>>> **remove** it from the code. And, yep I totally forgot we have it
> >>> in.
> >>>>> And
> >>>>>>> if we can make it in, easily (which it seems we can) - we can also
> >>>>> include
> >>>>>>> it in the first iteration. What I wanted to avoid really (from the
> >>>>> original
> >>>>>>> design) - again trying to simplify it, limit the changes, and
> >>> speed up
> >>>>>>> implementation. And there is one "complexity" that I wanted to
> >>> avoid
> >>>>>>> specifically - having to have separate , additional configuration
> >>> per
> >>>>> team.
> >>>>>>> Not only because it complicates already complex configuration
> >>> handling
> >>>>> (I
> >>>>>>> know we have PR for that) but mostly because if it is not needed,
> >>> we
> >>>> can
> >>>>>>> simplify documentation and explain to our users easier what they
> >>> need
> >>>>> to do
> >>>>>>> to have their own multi-team setup. And I am quite open to keeping
> >>>>>>> multiple-executors if we can avoid complicating configuration.
> >>>>>>>
> >>>>>>> But I think some details of that and whether we really need
> >>> separate
> >>>>>>> configuration might also come as a result of updating the AIP - I
> >>> am
> >>>> not
> >>>>>>> quite sure now if we need it, but we can discuss it when we
> >>> iterate on
> >>>>> the
> >>>>>>> AIP.
> >>>>>>>
> >>>>>>> J.
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> >>>>> For additional commands, e-mail: dev-h...@airflow.apache.org
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> For additional commands, e-mail: dev-h...@airflow.apache.org
>
>

Re: Discuss: AIP-67 (multi team) now that AIP-82 (External event driven dags) exists

Reply via email to