Re: Discuss: AIP-67 (multi team) now that AIP-82 (External event driven dags) exists

Jarek Potiuk Wed, 02 Jul 2025 23:47:56 -0700

> The direction this one is taking is interesting. If you're really just
trying to make the feature barely possible and mostly targeted towards
managed providers to implement the rest, then I suppose this hits the mark.


Well actually by taking the direction I took, it's not "mostly for managed
providers" - i see it as it is equally, for managed providers and on-prem
users, but also, following the open-source spirit, philosophically, I think
in Airflow, any such change should be done with those things in mind,
because we are at the stage where we are already "established' and by
innovating on top what we have we have sometimes more to lose than to gain
- so I feel with "deployment' features we should be very careful to
distinguish 'enabling things" vs. 'doing things". My focus with this
iteration was to remove all the roadblocks that make it impossible (or
extremely difficult) to implement "real" multi-team and separation without
modifying airflow core. I though "what is the minimal set of features that
will make it "possible" for someone motivated to deploy a single airflow
for multiple teams.

* minimise maintenance effort increase
* do not "spoil" the "simple case" - we do not want to add features that
make "simple" implementation more complex the current `docker run -it
apache/airflow standalone` - should be simple and straightforward to run
* if there is anything that involves complex deployment, we should not aim
to make a "turn-key" solution that we will have to support - similarly like
we do with our configuration parameters, we have 100s knobs to turn, and as
long as default settings are reasonable and someone "motivated" can
configure and fine-tune - this configuration and fine-tuning should be left
to them - regardless if they are on-prem or managed. And both should be
able to do it.

I think it's not only smart technically (we support the low-level basic
features and when someone puts them together and makes it more of a
turn-key solution they are responsible for designing and implementing it -
so we have less maintenance effort. But also it's good from a simple
"open-source business model point of view" - i.e. it's a smart product
decision we should make.

Why airflow is #18 in OSS rank - of course we have a huge community and
people contributing in their free time, completely voluntarily. And we
cherish, support and encourage it. But let's be honest - if not all those
that make business on top of airflow did not invest literally millions of
dollars (in terms of engineering salary, sponsoring Airflow Summit,
supporting people like me (some smart stakeholders at least who understand
the value of it) who can be good "community spirit" - Airflow would have
order of magnitude less activity, reach, Airflow 3 would not be simply
possible. And this is a good thing that we have those stakeholders that are
interested and make money by turning Airflow into a "turn-key" solution.
This is a fantastic, symbiotic relationship.

So - what my thinking is - we should NOT make things that make airflow more
turn-key for those complex cases. We should leave it up to those who want
to make it and want to charge money for it. This is cool and great that
they can do - and we should not do it "for them" - but on the other hand -
we should make it possible that those who want to turn airflow into more
complex (say multi-team solution) to make it happen - by providing them
with minimal set of features that make it possible.

And that also - in a way - keeps the balance between on-prem and managed
implementation.

Something that I've learned as a rule of thumb is that making a feature
"generic" compared to custom implementation is 3x-10x more expensive (both
in implementation and maintenance). And it means that if an on-prem user
wants to implement something for them (say turn-key multi-team solution for
their case) it will cost `x` , but when a managed provider wants to
implement a generic multi-team it will cost `10x`. But also managed
providers can spread the cost over the premium they will charge to their
users so that they don't have to manage Airflow on their own and pay `x`
for this mult-team feature to develop on their own. And this is a "fair"
choice to make by on-prem users. They might choose what they want to do
then. Also it's fair for managed provider - yes they need to invest more,
but also they have a chance to shine on promoting it and making it more
optimised at scale etc. etc.

That is my line of thinking.


J.


On Thu, Jul 3, 2025 at 1:41 AM Oliveira, Niko <oniko...@amazon.com.invalid>
wrote:

> Hey Jarek,
>
>
> The direction this one is taking is interesting. If you're really just
> trying to make the feature barely possible and mostly targeted towards
> managed providers to implement the rest, then I suppose this hits the mark.
>
> But this is not something we're asking for at Amazon and personally I
> think we should make the feature reasonably usable for those running
> self-managed OSS Airflow as well. There are many users running an on-prem
> Airflow. Getting too hyper-fixated on an implementation that's so
> simplified that it's obtuse and difficult to use by most users seems like
> the wrong approach to me. But you and I have already discussed this at
> length and I haven't convinced you so far, so if I'm the only one with this
> thinking then I'm happy to disagree and commit as we say at Amazon :)
>
>
> > So I would be rather strong on **not** touching the current
> configuration and
>
> simply adding configuration for per-team executors in executor config -
> even if it is uglier and more "low-level".
>
> Can you explain what "adding configuration for per-team executors in
> executor config" would look like? I don't have a concrete sense of what you
> mean by this.
>
> Thanks for your efforts on trying to get this feature agreed to and voted
> on. Looking forward to working on the project in the coming weeks!
>
> Cheers,
> Niko
>
> ________________________________
> From: Jarek Potiuk <ja...@potiuk.com>
> Sent: Tuesday, July 1, 2025 10:26:55 PM
> To: dev@airflow.apache.org
> Subject: RE: [EXT] Discuss: AIP-67 (multi team) now that AIP-82 (External
> event driven dags) exists
>
> CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
>
>
> AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe.
> Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez
> pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que
> le contenu ne présente aucun risque.
>
>
>
> Any last comments ? There is a long weekend coming up in the US, so I will
> likely start voting on the updated AIP on Monday 7th.
>
> On Fri, Jun 27, 2025 at 12:41 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
> > I'd really love to finalise discussion and put it up to a vote some time
> > after the recording from the last dev call is posted - so that more
> > context, details and the LONG discussion we had on it. There is no *huge*
> > hurry  - we have strong dependency on Task Isolation and it seems that it
> > will still take a bit of time to complete, so I'd say I would love to
> start
> > voting in about a week time - so that maybe at the next dev call we can
> > "seal" the subject. Happy to see any more comments - especially from
> those
> > who have opinions but they had no opportunity to express them.
> >
> > I am personally very happy with the direction it took - simplification
> and
> > "MVP" kind of approach - also I invite the stakeholders of ours to take a
> > close look at the scope and what we really propose - I have a feeling
> that
> > we can balance it out - there is something we can make to make it not
> > "worse" for the offerings they have. I think we have a really good
> > symbiotic relationship here, and I would love to leverage that. For one -
> > my goal here is to have a minimum number of changes that are impacting
> > maintainability of the open-source airflow - but mostly "opening up some
> > possibilities" - rather than provide turn-key solutions. And mostly
> because
> > this is good for all sides - less maintenance and complexity for OSS
> > maintainers, but more opportunities to make it into "turn-key" solutions
> by
> > the stakeholders, while also allowing the "on-prem" users - if they are
> > highly motivated - to use those features by adding the "turn-key" layer
> on
> > their own. Also adding multi-team should not be at the expense of
> "simple"
> > installations - they should be virtually unaffected.
> >
> > One example of applying this is cutting on "separate config files". I
> > think it moves us closer to a "turn-key" solution but it is not really
> > necessary to achieve the three goals above - that's why in the current
> > proposal this part is completely removed - Sorry Niko, but I still think
> > it's one of the things that falls into this bucket. We can easily remove
> > it, they complicate code, documentation and options the users have, and
> > even if it is a "little" more complex to manage configuration by
> motivated
> > users, it's also an opportunity for "turn-key" option that stakeholders
> can
> > build in their products - and we do not have to maintain it in the
> > open-source. So I would be rather strong on **not** touching the current
> > configuration and simply adding configuration for per-team executors in
> > executor config - even if it is uglier and more "low-level".
> >
> > So if there are some constructive ideas on what can be done to make it
> > "simpler" and less "turn-key" in that respect - I would highly value such
> > ideas and comments. If we can cut down something more that is not
> > "necessary" for the three primary goals I came up with - I am more than
> > happy to do it.
> >
> > Just to remind - those are the "extracted" goals. I slightly updated them
> > and added to the preamble of the AIP:
> >
> > * less operational overhead for managing multi-team (once AIP-72 is
> > complete) where separate execution environments are important
> > * virtual assets sharing between teams
> > * ability of having "admin" and "team sharing" capability where dags from
> > multiple teams can be seen in a single Airflow UI  (requires custom RBAC
> an
> > AIP-56 implementation of Auth Manager - with KeyCloak Auth Manager being
> a
> > reference implementation)
> >
> > J.
> >
> >
> > On Thu, Jun 26, 2025 at 10:53 AM Jarek Potiuk <ja...@potiuk.com> wrote:
> >
> >>
> >>> One technical observation: Now that the dag table no longer has a
> >>> team_id in it, what would the behaviour be when a DAG is attempted to
> move
> >>> between bundles? How do we detect this? (I’m not all convinced that we
> >>> correctly detect duplicate dag ids across bundles today, so I wouldn’t
> >>> assume or rely on the current behaviour.)
> >>>
> >>
> >> Of course - yes, I realise that - that problem was also not handled in
> >> the previous iteration to be honest. That is something that dag bundle
> >> solution allows to solve eventually - but I do not think it's a blocker
> for
> >> the proposed implementation. We will have to eventually add some way of
> >> blocking dags to jump between bundles, we might tackle this separately.
> I
> >> already wanted to propose a separate update to that - but I did not
> want to
> >> complicate the current proposal. One thing at a time. I can, however -
> if
> >> you consider that as a blocker, extend the current AIP with it. Not a
> big
> >> problem. This is however a bit independent from the team_id
> introduction.
> >>
> >> Overall, I am still unconvinced this proposal has enough real user
> >>> benefit over actually separate deployments, and on balance of the added
> >>> complexity and maintenance burden I do not think it is worth it.
> >>>
> >>
> >> That makes me sad, I thought that over the course of the discussion I
> >> addressed all the concerns (in this case the concern was "is it worth
> with
> >> the cost and little benefit", but when I did it and heavily limited the
> >> impact, now the concern is "is it worth at all as changes are really
> >> minimal" - and surely, anyone can change and adapt their concerns, over
> >> time,  but that one seems like ever-moving target. I hoped at least for
> >> some acknowledgment of some concerns (complexity in this case) is
> >> addressed, but it seems that you are deeply convinced that we do not
> need
> >> multi-team at all (which is in stark contrast with at least a dozen of
> >> bigger and smaller users of Airflow who submitted talks to Airflow
> summit
> >> (including about 5 or 6 submissions for Airflow 2025) on how they spent
> >> their engineering effort, time and money on trying to achieve something
> >> similar - they assessed that it's worth, you  assess that it's not
> worth.
> >> Somehow I trust our users that they were not spending the money, time
> and
> >> engineering effort to achieve this because they wanted to spend more
> money.
> >> I think they assessed it's worth it. So I want to make it a bit easier
> and
> >> more "proper" way for them to do that.
> >>
> >>>
> >>> Upgrades: it is not easier to upgrade under this multi team proposal,
> >>> but much much harder. This is based on hard earned experience from
> helping
> >>> Astronomer users — having to coordinate upgrades between multiple teams
> >>> turns in to a months long slog of the hardest kind of work —  people
> work:
> >>> getting other teams to agree to do things that they don’t directly care
> >>> about — “It’s working for me, I don’t care about upgrading, we’ll get
> to it
> >>> next quarter” is a refrain I’ve heard many times.
> >>>
> >>
> >> Yes. absolutely - this is why we deferred it until we knew what shape
> >> task isolation and other AIPs we depend on take on. Because it is clear
> >> that pretty much all the problem you explain above are going to be
> solved
> >> with task isolation. And it's not just my opinion. If you want to argue
> >> with it, you likely need to argue with yourself:
> >> https://github.com/apache/airflow/issues/51545#issuecomment-2980038478.
> >> Let me quote what you wrote there last week:
> >>
> >> Ash Berlin Taylor wrote:
> >>
> >> > A tight coupling between task-sdk and any "server side" component is
> >> the opposite to one of the goals of AIP-72 (I'm not sure we ever
> explicitly
> >> said this, but the first point of motivation for the AIP says
> "Dependency
> >> conflicts for administrators supporting data teams using different
> versions
> >> of providers, libraries, or python packages")
> >> > In short, my goal with TaskSDK, and the reason for introducing CalVer
> >> and Cadwyn with the execution API is to end up in a world where you can
> >> upgrade the Airflow Scheduler/API server interdependently of any worker
> >> nodes (with the exception that the server must be at least as new as the
> >> clients)
> >> > This ability to have version-skew is pretty much non-negotiable to me
> >> and is (other than other languages) one of primary benefits of AIP-72
> >>
> >> If you read yourself from that quote it basically means "it will be easy
> >> to upgrade airflow independently of workers". So I am a bit confused
> here.
> >> Yes, I agree it was difficult, but you yourself explain that when AIP-72
> >> (which since API-67 has been accepted has always beem prerequisite of
> it)
> >> wrote it will be "easy". So I am not sure why you are bringing it now.
> We
> >> assume AIP-72 will be completed and this problem will be gone. Let's not
> >> mention it any more please.
> >>
> >> The true separation from TaskSDK will likely only land in about 3.2 time
> >>> frame. We are actively working on it, but it’s a slow process of
> untangling
> >>> lots of assumptions made in the code base over the years. Maybe once we
> >>> have that my view would be different, but right now I think this makes
> the
> >>> proposal a non-starter. Especially as you are saying that most teams
> will
> >>> have unique connections. If they’ve got those already, then having an
> asset
> >>> trigger use those conns to watch/poll for activity is a much easier
> >>> solution to operate and crucially, to scale and upgrade.
> >>>
> >>
> >> Yes. I perfectly understand that and I am fully aware of potentially 3.2
> >> time-frame. And that's fine. Actually I heartily invite you to listen to
> >> the part of my talk from Berlin Buzzwords when I was asked for the
> timeline
> >> - https://youtu.be/EyhZOnbwc-4?t=2226 - this link leads to the exact
> >> timeline in my talk . My answer was basically - "3.1" or "3.2", and I
> >> sincerely hope "3.1" but we might not be able to complete it because we
> >> have other things to do (other - is indeed the Task Isolation work that
> you
> >> are leading). And that's perfectly fine. And it absolutely does not
> prevent
> >> us from voting on the AIP now - similarly as we voted on the previous
> >> version of the AIP - knowing that it has some prerequisites a few months
> >> ago. Especially that we know that the feature we need from task
> isolation
> >> is "non-negotiable". I.e. it WILL happen. We don't hope for it, we know
> it
> >> will be there. Those are your own words.
> >>
> >>
> >>> >  I think we can’t compare AIP-82 to sharing virtual assets due to
> >>> complexity of it.
> >>>
> >>> Virtual Assets was a mistake, and not how users actually want to use
> >>> them. Mea culpa
> >>>
> >>
> >> This is the first time I hear this - certainly you never raised this
> >> concern on the devlist. So if you have some concerns about virtual
> assets I
> >> think you should raise it on the devlist, because I think everyone here
> is
> >> missing some conversation (or maybe it's just your private opinion that
> >> you never shared with anyone, but maybe it's worth). I would be
> >> interested to hear how the feature that was absolutely most successful
> >> feature of airflow 2 was a mistake. According to the 2024 survey
> >> https://airflow.apache.org/blog/airflow-survey-2024/  - 48% of Airflow
> >> users have been using it, even if it was added as one of the last
> >> big features of Airflow 2. It's the MOST used feature out of all the
> >> features out there. I would be really curious to see how it was a
> mistake
> >> (but  please start a separate thread explaining why you think it was a
> >> mistake, what are your data points and what do you think should be
> fixed.
> >> Just dropping "virtual assets were a mistake" in the middle of
> multi-team
> >> conversation seems completely unjustified without knowing what you are
> >> talking about. So I think, until we know more, this argument has no
> base.
> >>
> >>
> >>>
> >>> S
> >>> To restate my points:
> >>>
> >>> - Sharing a deployment between teams today/in 3.1 is operationally more
> >>> complex (both scaling, and upgrades) — this is a con, not a plus.
> >>>
> >>
> >> Surely. But it will be easier when AIP-72 is complete (which I am
> >> definitely looking forward to and as clearly explained in AIP-82, is a
> >> prerequisite of it). Nothing changed here.
> >>
> >>
> >>> - The main user benefit appears to be “allow teams’ DAGs to communicate
> >>> via Assets”, in which case we can do that today by putting more work
> in to
> >>> AIP-82’s Asset triggers
> >>>
> >>
> >> No. Lower operational complexity for multi-teams (providing that we
> >> deliver AIP-72) is another benefit. Virtual assets is another, and since
> >> there is no ground in "virtual assets is a mistake" statement (not until
> >> you explain what you mean by that in a separate discussion) - this is
> also
> >> still a very valid point.
> >>
> >>
> >>> Soon, we will have then be asked about cross-team governance, policy
> >>> enforcement, and potentially unbounded edge cases (e.g., team-specific
> >>> secrets, roles, quotas). ain, you get this for free with truely
> separate
> >>> deployments already
> >>> allow different teams to use different executors (including multiple
> >>> executors per-team following AIP-61)
> >>>
> >>
> >> Not really. We very explicitly say in the AIP that his is not a goal and
> >> that we have no plans for. And yes, using separate executors per team is
> >> actually back in the AIP-82 in case you did not notice (and the code
> needed
> >> for it's even implemented and merged already in main by Vincent).
> >>
> >>
> >>> Provably not true right now, and until ~3.2 delivers the full Task
> >>> SDK/Core dependency separation this would be _more_ work to upgrade,
> not
> >>> less, and that work is not shared but still on a central team.
> >>>
> >>
> >> Absolutely - we will wait for AIP-72 completion. I do not want to say
> 3.1
> >> or 3.2 directly - because there are - as you said - a lot of moving
> pieces.
> >> So my target for multi-team is "After AIP-72 is completed". Full stop.
> But
> >> there is nothing wrong with accepting the AIP now and doing preparatory
> >> work in parallel. Similarly as there is no way to have a baby in 1
> month by
> >> 9 women, there is no way adding more effort to task-sdk isolation will
> >> speed it up - we alredy have not only 3 people (you leading it, Kaxil
> and
> >> Amog) but also all the help from me and even 10s of different
> contributors
> >> (for example with the recent db_test cleanup that I took leadership on)
> -
> >> and there are people who wish to work on adding multi-team features.
> Since
> >> the design heavily limits impact on airflow codebase and interactions
> with
> >> task-sdk implementation, there is nothing wrong with starting
> >> implementation in parallel either- amazon team is keen to move it
> forward -
> >> they even already implemented SQS trigger for assets, and we are working
> >> together on FAB removal, Keycloak authentication manager - and they
> seem to
> >> still have capacity and drive to progress multi-team. So I am not sure
> if
> >> we are trading off something. There is no "if we work on more on task
> sdk
> >> and drop multi-team things will be faster". Generally in open source
> people
> >> work in the area where they feel they can provide best value - such as
> you
> >> working on task-sdk, me on CI,dev env, they will deliver more value on
> >> multi-team
> >>
> >>
> >>>
> >>> So please, as succinctly as possible, please tell me what the direct
> >>> benefit to users this proposal is over us putting this effort in to
> writing
> >>> better Asset triggers instead?
> >>>
> >>
> >>
> >> * less operational overhead for managing multi-team (once AIP-72 is
> >> complete) where separate execution environments are important
> >> * virtual assets sharing
> >> * ability of having "admin" and "team sharing" capability where dags
> from
> >> multiple teams can be seen in a single Airflow UI  (requires custom
> RBAC)
> >>
> >> None of this can be done via beter asset triggers
> >>
> >>
> >>>
> >>> > On 23 Jun 2025, at 10:57, Jarek Potiuk <ja...@potiuk.com> wrote:
> >>> >
> >>> > My counter-points:
> >>> >
> >>> >
> >>> >> 1. Managing a multi team deployment is not materially different from
> >>> >> managing a deployment per team
> >>> >>
> >>> >
> >>> > It's a bit easier - especially when it comes to upgrades (especially
> >>> in the
> >>> > case we are targetting when we are not targetting multi-tenant, but
> >>> several
> >>> > relatively closely cooperating teams with different dependncy
> >>> requiremens
> >>> > and isolation need.
> >>> >
> >>> > 2. The database changes were quite wide-reaching
> >>> >>
> >>> >
> >>> > Yes. that is addressed.
> >>> >
> >>> >
> >>> >> 3. I don’t believe the original AIP (again, I haven’t read the
> updated
> >>> >> proposal or recent messages on the thread. yet) will meet what many
> >>> users
> >>> >> want out of a multiteam solution
> >>> >>
> >>> >
> >>> > I think we will only see when we try. A lot of people thing they
> would,
> >>> > even if they are warned. I know at least one user (Wealthsimple) who
> >>> > definitely want to use it and they got a very detailed explanation of
> >>> the
> >>> > idea and understand it well. So I am sure that **some** users would.
> >>> But we
> >>> > do not know how many.
> >>> >
> >>> >
> >>> >> To expand on those points a bit more
> >>> >>
> >>> >> On 1. The only components that are shared are, I think, the
> scheduler
> >>> and
> >>> >> the API server, and it’s arguable if that is actually a good idea
> >>> given
> >>> >> those are likely to be the most performance sensitive components
> >>> anyway.
> >>> >>
> >>> >> Additionally the fact that the scheduler is a shared component makes
> >>> >> upgrading it almost a non starter as you would likely need buy-in,
> >>> changes,
> >>> >> and testing form ALL teams using it. I’d argue that this is a huge
> >>> negative
> >>> >> until we finish off the version indepence work of AIP-72.
> >>> >>
> >>> >
> >>> > Quite disagree here - especially that our target is that task-sdk is
> >>> > supposed to provide all isolation that is needed. There should be 0
> >>> changes
> >>> > in the dags needed to upgrade scheduler, api_server, triggerer -
> >>> precisely
> >>> > because we introduced backwards-compatible task-sdk.
> >>> >
> >>> > On 3 my complaint is essentially that this doesn’t go nearly far
> >>> enough. It
> >>> >> doesn’t allow read only views to other teams dags. I don’t think it
> >>> allows
> >>> >> you to be in multiple teams at once. You can’t share a connection
> >>> between
> >>> >> teams but only allow certain specified dags to access it, but would
> >>> have to
> >>> >> either be globally usable, or duplicated-and-kept-in-sync between
> >>> teams. In
> >>> >> short I think it fall short of being useful..
> >>> >>
> >>> >
> >>> > Oh absolutely all that is possible (except sharing single connections
> >>> > between multiple teams - which is a very niche use cases and
> >>> duplication
> >>> > here is perfectly ok as first approximation - and if we need more we
> >>> can
> >>> > add it later).
> >>> >
> >>> > Auth manager RBAC and access is abstracted away, and the Keyclock
> >>> Manager
> >>> > implemented by Vincent allows to manage completely independent and
> >>> separate
> >>> > RBAC based on arguments and resources provided by Airflow. There is
> >>> nothing
> >>> > to prevent the user who configures KeyCloak RBAC to define it in the
> >>> way:
> >>> >
> >>> > if group a > allow to read a and write b
> >>> > if group b > alllow to write b but not a
> >>> >
> >>> > and any other combinations. KeyCloak implementation - pretty advanced
> >>> > already - (and design of auth manager) completely abstracts away both
> >>> > authentication and authorization to KeyCloak and KeyCloak has RBAC
> >>> > management built in. Also any of the users can write their own - even
> >>> > hard-coded authentication manager to do the same if they do not want
> to
> >>> > have configurable KeyCloak. Even SimpleAuthManager could be
> hard-coded
> >>> to
> >>> > provide thiose features.
> >>> >
> >>> >
> >>> >>
> >>> >> So on the surface, I’m no more in favour of using dag bundle as a
> >>> >> replacement for team id as I think most of the above points still
> >>> stand.
> >>> >>
> >>> >
> >>> > We disagree here.
> >>> >
> >>> >>
> >>> >> My counter proposal: We do _nothing_ to core airflow. We work on
> >>> improving
> >>> >> the event-based trigger o fdags (write more triggers for read/check
> >>> remote
> >>> >> Assets etc) so that teams can have 100% isolated deployments but
> still
> >>> >> trigger dags based on asset events from other teams.
> >>> >>
> >>> >
> >>> > That does not solve any of the other design goals - only allows to
> >>> trigger
> >>> > assets a bit more easily (but also it's not entirely solved by AIP-82
> >>> > because it does not solve virtual assets - only ones that have
> defined
> >>> > triggerer and "something" to listen on - which is way more complex
> than
> >>> > just defining asset in a Dag and using it in another). I think we
> can't
> >>> > compare AIP-82 to sharing virtual assets due to complexity of it. I
> >>> > explained it in the doc.
> >>> >
> >>> >
> >>> > I will now go and catch up with the long thread and updated proposal
> >>> and
> >>> >> come back.
> >>> >>
> >>> >
> >>> > Please. I hope the above explaination will help in better
> >>> understanding of
> >>> > the proposal, because I think you had some assumptions that do not
> >>> hold any
> >>> > more with the new proposal.
> >>> >
> >>> > J.
> >>> >
> >>> >
> >>> >>
> >>> >>> On 23 Jun 2025, at 05:54, Jarek Potiuk <ja...@potiuk.com> wrote:
> >>> >>>
> >>> >>> Just to clarify the relation - I updated the AIP now to refer to
> >>> AIP-82
> >>> >> and
> >>> >>> to explain relation between the "cross-team" and "cross-airflow"
> >>> asset
> >>> >>> triggering - this is what I added:
> >>> >>>
> >>> >>> Note that there is a relation between AIP-82 ("External Driven
> >>> >> Scheduling")
> >>> >>> and this part of the functionality. When you have multiple
> instances
> >>> of
> >>> >>> Airflow, you can use shared datasets - "Physical datasets" - that
> >>> several
> >>> >>> Airflow Instances can use - for example there could be an S3 object
> >>> that
> >>> >> is
> >>> >>> produced by one airflow instance, and consumed by another. That
> >>> requires
> >>> >>> deferred trigger to monitor for such datasets, and appropriate
> >>> >> permissions
> >>> >>> to the external dataset, and you could achive similar result to
> >>> >> cross-team
> >>> >>> dataset triggering (but cross airflow). However the feature of
> >>> sharing
> >>> >>> datasets between the teams also works for virtual assets, that do
> not
> >>> >> have
> >>> >>> physically shared "objects" and trigger that is monitoring for
> >>> changes in
> >>> >>> such asset.
> >>> >>>
> >>> >>> J.
> >>> >>>
> >>> >>>
> >>> >>> On Mon, Jun 23, 2025 at 6:38 AM Jarek Potiuk <ja...@potiuk.com>
> >>> wrote:
> >>> >>>
> >>> >>>>> From a quick glance, the updated AIP didn't seem to have any
> >>> reference
> >>> >> to
> >>> >>>>> AIP-82, which surprised me, but will take a more detailed read
> >>> through.
> >>> >>>>
> >>> >>>> Yep. It did not - because I did not think it was needed or even
> very
> >>> >>>> important after the simplifications. AIP-82 has a different scope,
> >>> >> really.
> >>> >>>> It only helps when the Assets are "real" data files which we have
> >>> >> physical
> >>> >>>> triggers for, it's slightly related - sharing datasets between
> teams
> >>> >>>> (including those that do not require physical files and triggers)
> is
> >>> >> still
> >>> >>>> possible in the design we have now, but it's not (and never was)
> the
> >>> >>>> **only** reason for having multi-team. There always was (and still
> >>> is)
> >>> >> the
> >>> >>>> possibility of having a common, distinct environments (i.e.
> >>> dependencies
> >>> >>>> and providers) per team, the possibility of having connections and
> >>> >>>> variables that are only accessible to one team and not the other,
> >>> and
> >>> >>>> isolating workload execution (all that while allowing to manage
> >>> multiple
> >>> >>>> team and schedule things with single deployment). That did not
> >>> change.
> >>> >> What
> >>> >>>> changed a lot is that it is now way simpler, something that we can
> >>> >>>> implement without heavy changes to the codebase - and give it to
> our
> >>> >> users,
> >>> >>>> so that they can assess if this is something they need without too
> >>> much
> >>> >>>> risk and effort.
> >>> >>>>
> >>> >>>> This was - I believe the main concern, that the value we get from
> >>> it is
> >>> >>>> not dramatic, but the required changes are huge. This "redesign"
> >>> changes
> >>> >>>> the equation - the value is still unchanged, but the cost of
> >>> >> implementing
> >>> >>>> it and impact on the Airflow codebase is much smaller. I still
> have
> >>> not
> >>> >>>> heard back from Ash if my proposal responds to his original
> concern
> >>> >> though,
> >>> >>>> so I am mostly guessing (also based on the positive impact of
> >>> others)
> >>> >> that
> >>> >>>> yes it does. But to be honest I am not sure and I would love to
> hear
> >>> >> back,
> >>> >>>> I decided to update the AIP to reflect it - regardless, because I
> >>> think
> >>> >> the
> >>> >>>> simplification I proposed keeps the original goals, but is indeed
> >>> way
> >>> >>>> simpler.
> >>> >>>>
> >>> >>>>> This is a very difficult thread to catch up on.
> >>> >>>>
> >>> >>>> Valid point. Let me summarize what is the result:
> >>> >>>>
> >>> >>>> * I significantly simplified the implementation proposal comparing
> >>> to
> >>> >> the
> >>> >>>> original version
> >>> >>>> * main simplification is very limited impact on existing database
> -
> >>> >>>> without "ripple effect" that would require us to change a lot of
> >>> tables,
> >>> >>>> including their primary keys, and heavily impact the UI
> >>> >>>> * this is now more of an incremental change that can be
> implemented
> >>> way
> >>> >>>> faster and with far less risk
> >>> >>>> * updated idea is based on leveraging bundles (already part of our
> >>> data
> >>> >>>> model) to map them (many-to-one) to a team - which requires to
> just
> >>> >> extend
> >>> >>>> the data model with bundle mapping and add team_id to connections
> >>> and
> >>> >>>> variables. Those are all needed DB changes.
> >>> >>>>
> >>> >>>> The AIP is updated - in a one single big change so It should be
> >>> easy to
> >>> >>>> compare the changes:
> >>> >>>>
> >>> >>
> >>>
> https://cwiki.apache.org/confluence/pages/viewpreviousversions.action?pageId=294816378
> >>> >>>> -> I even named the version appropriately "Simplified multi-team
> >>> AIP" -
> >>> >> you
> >>> >>>> can select and compare v.65 with v.66 to see the exact
> differences I
> >>> >>>> proposed.
> >>> >>>>
> >>> >>>> I hope it will be helpful to catch up and for those who did not
> >>> follow,
> >>> >> to
> >>> >>>> be able to make up their minds about it.
> >>> >>>>
> >>> >>>> J.
> >>> >>>>
> >>> >>>>
> >>> >>>>
> >>> >>>> On Mon, Jun 23, 2025 at 4:35 AM Vikram Koka
> >>> >> <vik...@astronomer.io.invalid>
> >>> >>>> wrote:
> >>> >>>>
> >>> >>>>> This is a very difficult thread to catch up on.
> >>> >>>>> I will take a detailed look at the AIP update to try to figure
> out
> >>> the
> >>> >>>>> changes in the proposal.
> >>> >>>>>
> >>> >>>>> From a quick glance, the updated AIP didn't seem to have any
> >>> reference
> >>> >> to
> >>> >>>>> AIP-82, which surprised me, but will take a more detailed read
> >>> through.
> >>> >>>>>
> >>> >>>>> Vikram
> >>> >>>>>
> >>> >>>>>
> >>> >>>>>
> >>> >>>>> On Sun, Jun 22, 2025 at 1:44 AM Pavankumar Gopidesu <
> >>> >>>>> gopidesupa...@gmail.com>
> >>> >>>>> wrote:
> >>> >>>>>
> >>> >>>>>> Thanks Jarek, that's a great update on this AIP, now it's much
> >>> more
> >>> >> slim
> >>> >>>>>> down.
> >>> >>>>>>
> >>> >>>>>> left a minor comment. :) Overall looking great.
> >>> >>>>>>
> >>> >>>>>> Pavan
> >>> >>>>>>
> >>> >>>>>> On Sat, Jun 21, 2025 at 3:10 PM Jens Scheffler
> >>> >>>>> <j_scheff...@gmx.de.invalid
> >>> >>>>>>>
> >>> >>>>>> wrote:
> >>> >>>>>>
> >>> >>>>>>> Thanks for the rework/update of the AIP-72!
> >>> >>>>>>>
> >>> >>>>>>> Just a few small comments but overall I like it as it is much
> >>> leaner
> >>> >>>>>>> than originally planned and is in a level of complexity that it
> >>> >> really
> >>> >>>>>>> seems to be a benefit to close the gap as described.
> >>> >>>>>>>
> >>> >>>>>>> On 21.06.25 14:52, Jarek Potiuk wrote:
> >>> >>>>>>>> I updated the AIP - including architecture images and reviewed
> >>> it
> >>> >>>>>> (again)
> >>> >>>>>>>> and corrected any ambiguities and places where it needed to be
> >>> >>>>> changed.
> >>> >>>>>>>>
> >>> >>>>>>>> I think the current state
> >>> >>>>>>>>
> >>> >>>>>>>
> >>> >>>>>>
> >>> >>>>>
> >>> >>
> >>>
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-67+Multi-team+deployment+of+Airflow+components
> >>> >>>>>>>> - nicely describes the proposal.
> >>> >>>>>>>>
> >>> >>>>>>>> Comparing to the previous one:
> >>> >>>>>>>>
> >>> >>>>>>>> 1. The DB changes are far less intrusive - no ripple effect on
> >>> >>>>> Airflow
> >>> >>>>>>>> 2. There is no need to merge configurations and provide
> >>> different
> >>> >>>>> set
> >>> >>>>>> of
> >>> >>>>>>>> configs per team - we can add it later but I do not see why we
> >>> need
> >>> >>>>> it
> >>> >>>>>> in
> >>> >>>>>>>> this simplified version
> >>> >>>>>>>> 3. We can still configure a different set of executors per
> team
> >>> -
> >>> >>>>> that
> >>> >>>>>> is
> >>> >>>>>>>> already implemented (we just need to wire it to the bundle ->
> >>> team
> >>> >>>>>>> mapping).
> >>> >>>>>>>>
> >>> >>>>>>>> I think it will be way simpler and faster to implement this
> way
> >>> and
> >>> >>>>> it
> >>> >>>>>>>> should serve as MVMT -> Minimum Viable Multi Team that we can
> >>> give
> >>> >>>>> our
> >>> >>>>>>>> users so that they can provide feedback.
> >>> >>>>>>>>
> >>> >>>>>>>> J.
> >>> >>>>>>>>
> >>> >>>>>>>>
> >>> >>>>>>>>
> >>> >>>>>>>>
> >>> >>>>>>>> On Fri, Jun 20, 2025 at 8:33 AM Jarek Potiuk <
> ja...@potiuk.com>
> >>> >>>>> wrote:
> >>> >>>>>>>>
> >>> >>>>>>>>>
> >>> >>>>>>>>>
> >>> >>>>>>>>>> I like this iteration a bit more now for sure, thanks for
> >>> being
> >>> >>>>>>> receptive
> >>> >>>>>>>>>> to feedback! :)
> >>> >>>>>>>>>>
> >>> >>>>>>>>>
> >>> >>>>>>>>>> This now becomes quite close to what was proposing before,
> we
> >>> now
> >>> >>>>>> again
> >>> >>>>>>>>>> have a team ID (which I think is really needed here, glad to
> >>> see
> >>> >>>>> it
> >>> >>>>>>> back)
> >>> >>>>>>>>>> and it will be used for auth management, configuration
> >>> >>>>> specification,
> >>> >>>>>>> etc
> >>> >>>>>>>>>> but will be carried by Bundle instead of the dag model.
> Which
> >>> as
> >>> >>>>> you
> >>> >>>>>>> say
> >>> >>>>>>>>>> “For that we will need to make sure that both api-server,
> >>> >>>>> scheduler
> >>> >>>>>> and
> >>> >>>>>>>>>> triggerer have access to the "bundle definition" (to perform
> >>> the
> >>> >>>>>>> mapping)"
> >>> >>>>>>>>>> which honestly doesn’t feel too much different from the
> >>> original
> >>> >>>>>>> proposal
> >>> >>>>>>>>>> we had last week of adding it to Dag table and ensuring it’s
> >>> >>>>>> available
> >>> >>>>>>>>>> everywhere. but either way I’m happy to meet in the middle
> and
> >>> >>>>> keep
> >>> >>>>>> it
> >>> >>>>>>> on
> >>> >>>>>>>>>> Bundle if everyone else feels that’s a more suitable
> location.
> >>> >>>>>>>>>>
> >>> >>>>>>>>> I think the big difference is the "ripple effect" that was
> >>> >>>>> discussed
> >>> >>>>>> in
> >>> >>>>>>>>>
> >>> https://lists.apache.org/thread/78vndnybgpp705j6sm77l1t6xbrtnt5c
> >>> >>>>>> (and I
> >>> >>>>>>>>> believe - correct me if I am wrong Ash - important trigger
> for
> >>> the
> >>> >>>>>>>>> discussion) so far what we wanted is to extend the primary
> key
> >>> and
> >>> >>>>> it
> >>> >>>>>>> would
> >>> >>>>>>>>> ripple through all the pieces of Airflow -> models, API, UI
> >>> etc.
> >>> >>>>> ...
> >>> >>>>>>>>> However - we already have `bundle_name" and "bundle_version"
> >>> in the
> >>> >>>>>> Dag
> >>> >>>>>>>>> model. So I think when we add a separate table where we map
> the
> >>> >>>>> bundle
> >>> >>>>>>> to
> >>> >>>>>>>>> the team, the "ripple effect" will be almost 0. We do not
> want
> >>> to
> >>> >>>>>> change
> >>> >>>>>>>>> primary key, we do not want to change UI in any way (except
> >>> >>>>> filtering
> >>> >>>>>> of
> >>> >>>>>>>>> DAGs available based on your team - but that will be handled
> in
> >>> >>>>> Auth
> >>> >>>>>>>>> Manager and will not impact UI in any way, I think that's a
> >>> huge
> >>> >>>>>>>>> simplification of the implementation, and if we agree to it
> - i
> >>> >>>>> think
> >>> >>>>>> it
> >>> >>>>>>>>> should speed up the implementation significantly. There are
> >>> only a
> >>> >>>>>>> limited
> >>> >>>>>>>>> number of times where you need to look up the team_id - so
> >>> having
> >>> >>>>> the
> >>> >>>>>>>>> bundle -> team mapping in a separate table and having to look
> >>> them
> >>> >>>>> up
> >>> >>>>>>>>> should not be a problem. And it has much less complexity and
> >>> >>>>>>>>> "ripple-effect" through the codebase (for example I could
> >>> imagine
> >>> >>>>> 100s
> >>> >>>>>>> or
> >>> >>>>>>>>> thousands already written tests that would have to be adapted
> >>> if we
> >>> >>>>>>> changed
> >>> >>>>>>>>> the primary key - where there will be pretty much zero impact
> >>> on
> >>> >>>>>>> existing
> >>> >>>>>>>>> tests if we just add bundle -> team lookup table.
> >>> >>>>>>>>>
> >>> >>>>>>>>>
> >>> >>>>>>>>>> One other thing I’d point out is that I think including
> >>> executors
> >>> >>>>> per
> >>> >>>>>>>>>> team is a very easy win and quite possible without much
> work.
> >>> I
> >>> >>>>>> already
> >>> >>>>>>>>>> have much of the code written. Executors are already aware
> of
> >>> >>>>> Teams
> >>> >>>>>>> that
> >>> >>>>>>>>>> own them (merged), I have a PR open to have configuration
> per
> >>> team
> >>> >>>>>>> (with a
> >>> >>>>>>>>>> quite simple and isolated approach, I believe you approved
> >>> Jarek).
> >>> >>>>>> The
> >>> >>>>>>> last
> >>> >>>>>>>>>> piece is updating the scheduling logic to route tasks from a
> >>> >>>>>> particular
> >>> >>>>>>>>>> Bundle to the correct executor, which shouldn’t be much work
> >>> >>>>> (though
> >>> >>>>>> it
> >>> >>>>>>>>>> would be easier if the Task models had a column for the team
> >>> they
> >>> >>>>>>> belong
> >>> >>>>>>>>>> to, rather than having to look up the Dag and Bundle to get
> >>> the
> >>> >>>>>> team) I
> >>> >>>>>>>>>> have a branch where I was experimenting with this logic
> >>> already.
> >>> >>>>>>>>>> Any who, long story short, I don’t think we necessarily need
> >>> to
> >>> >>>>>> remove
> >>> >>>>>>>>>> this piece from the project's scope if it is already partly
> >>> done
> >>> >>>>> and
> >>> >>>>>>> not
> >>> >>>>>>>>>> too difficult.
> >>> >>>>>>>>>>
> >>> >>>>>>>>> Yeah. I hear you here again. Certainly I would not want to
> just
> >>> >>>>>>>>> **remove** it from the code. And, yep I totally forgot we
> have
> >>> it
> >>> >>>>> in.
> >>> >>>>>>> And
> >>> >>>>>>>>> if we can make it in, easily (which it seems we can) - we can
> >>> also
> >>> >>>>>>> include
> >>> >>>>>>>>> it in the first iteration. What I wanted to avoid really
> (from
> >>> the
> >>> >>>>>>> original
> >>> >>>>>>>>> design) - again trying to simplify it, limit the changes, and
> >>> >>>>> speed up
> >>> >>>>>>>>> implementation. And there is one "complexity" that I wanted
> to
> >>> >>>>> avoid
> >>> >>>>>>>>> specifically - having to have separate , additional
> >>> configuration
> >>> >>>>> per
> >>> >>>>>>> team.
> >>> >>>>>>>>> Not only because it complicates already complex configuration
> >>> >>>>> handling
> >>> >>>>>>> (I
> >>> >>>>>>>>> know we have PR for that) but mostly because if it is not
> >>> needed,
> >>> >>>>> we
> >>> >>>>>> can
> >>> >>>>>>>>> simplify documentation and explain to our users easier what
> >>> they
> >>> >>>>> need
> >>> >>>>>>> to do
> >>> >>>>>>>>> to have their own multi-team setup. And I am quite open to
> >>> keeping
> >>> >>>>>>>>> multiple-executors if we can avoid complicating
> configuration.
> >>> >>>>>>>>>
> >>> >>>>>>>>> But I think some details of that and whether we really need
> >>> >>>>> separate
> >>> >>>>>>>>> configuration might also come as a result of updating the AIP
> >>> - I
> >>> >>>>> am
> >>> >>>>>> not
> >>> >>>>>>>>> quite sure now if we need it, but we can discuss it when we
> >>> >>>>> iterate on
> >>> >>>>>>> the
> >>> >>>>>>>>> AIP.
> >>> >>>>>>>>>
> >>> >>>>>>>>> J.
> >>> >>>>>>>>>
> >>> >>>>>>>>>
> >>> >>>>>>>
> >>> >>>>>>>
> >>> ---------------------------------------------------------------------
> >>> >>>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> >>> >>>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org
> >>> >>>>>>>
> >>> >>>>>>>
> >>> >>>>>>
> >>> >>>>>
> >>> >>>>
> >>> >>
> >>> >>
> >>> >>
> ---------------------------------------------------------------------
> >>> >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> >>> >> For additional commands, e-mail: dev-h...@airflow.apache.org
> >>>
> >>>
>

Re: Discuss: AIP-67 (multi team) now that AIP-82 (External event driven dags) exists

Reply via email to