Hi everyone, One thing I’ve been struggling with while reading the other thread about multi-team DB changes[0] is what is the end-user problem we are trying to address with it.
The main impetus for opening this discussion is that a lot has changed in Airflow since this AIP was created in early 2024 and voted on mid-2024, and I'm wondering if those changes are big enough to invalidate the design and assumptions made at the time. Reading the DB changes thread I see that that changes are far reaching and necessarily have to touch most of the Airflow object models, and this got me thinking about what value do we actually get with the change, since as stated in the AIP some of the non-goals are[1] (slightly edited here for brevity with the “[…]"): > • Sharing broker/backend for celery executors between teams. This MAY be > covered by future AIPs > • Implementation of FAB-based multi-team Auth Manager. […] > • Per-team concurrency and prioritization of tasks. […]. > • Resource allocation per-executor. In the current proposal, executors are > run as sub-processes of Scheduler and we have very little control over their > individual resource usage. […] > • Turn-key multi-team Deployment of Airflow (for example via Helm chart). > This is unlikely to happen.[…] > • team management tools (creation, removal, rename etc.). […] > • Combining "global" execution with "team" execution. While it should be > possible in the proposed architecture to have a "team" execution and "global" > execution in a single instance of Airflow, this has it's own unique set of > challenges and assumption is that Airflow Deployment is either "global" > (today) or "multi-team" (After this AIP is implemented) - but it cannot be > combined (yet). This is possible to be implemented in the future. > • Running multiple schedulers - one-per team. While it should be possible if > we add support to select DAGs "per team" per scheduler, this is not > implemented in this AIP and left for the future And also Design Non-goals from the AIP [2]: > • It’s not a primary goal of this proposal to significantly decrease resource > consumption for Airflow installation compared to the current ways of > achieving “multi-tenant” setup. […] > • It’s not a goal of the proposal to provide a one-stop installation > mechanism for “Multi-team” Airflow. […] > • It’s not a goal to decrease the overall maintenance effort involved in > responding to needs of different teams, […] The main pain point that we seem to be addressing with this AIP is this[3]: > The main reason for having multi-team deployment of Airflow is achieving > security and isolation between the teams, coupled with ability of the > isolated teams to collaborate via shared Datasets. So what’s changed since we collectively (myself included) voted on and accepted this AIP? Well, we now have AIP-82 — External event driven dags. That could be used to achieve this goal right now in 3.0 with no changes to Airflow itself, and is perhaps a more robust mechanism of doing it too. So my main question, given the wide reaching code changes need for AIP-67, and (IMO) the imperfect/limited scope of team completion I wonder if using AIP-82 would not be a better solution to the problem. 1. It’s much simpler from a code level, as nothing need to change 2. It’s not _that_ much more complex from an operational point of view (you have to run an extra scheduler and web server, but those would likely need scaling up.) 3. We won’t disappoint people by not implementing the part of multi-team that they want (Someone being part of multiple teams, sharing connections/vars between teams) And using this mechanism (of external dataset/asset polling) also negates one of the biggest cons of the AIP-67, that of the tight coupling of Airflow versions between the teams. In larger companies this is a _huge_ problem already, and this would only make it worse. So what’s my idea (and at this stage is it only an idea for discussion) is that we re-evalute AIP-67 in light of what exists in Airflow 3.0 now and decide if it’s still worth the added complexity of DB, code and operational overhead, and decided if we still want it. Please, please, please point out if there are other benefits that I have missed, I'm not trying to be selective and get my way, I'm trying to make sure Airflow continues to meet the need of users, and can also continue to evolve (where I worry that complexity of code/datamodel materially hurts that final point) Thoughts? [0]: https://lists.apache.org/thread/78vndnybgpp705j6sm77l1t6xbrtnt5c [1]: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=294816378#AIP67MultiteamdeploymentofAirflowcomponents-Whatisexcludedfromthescope? [2]: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=294816378#AIP67MultiteamdeploymentofAirflowcomponents-DesignNonGoals [3]: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=294816378#AIP67MultiteamdeploymentofAirflowcomponents-Whyisitneeded? --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org For additional commands, e-mail: dev-h...@airflow.apache.org