On Fri, Jun 13, 2025 at 7:09 PM Vincent Beck <vincb...@apache.org> wrote:
> Thanks, Jarek, for this proposal. Overall, I really like it—it > significantly simplifies multi-team support and removes the need to deploy > additional components per team, without compromising on the core needs of > users (unless I’m missing something). > Yep. I think with this "design" iteration, I put "simplicity" and "maintainability" as the primary goal. Separate configuration per tem goes out the window, ripple-effect on the DB goes out the window, what's left is basically the same Airflow we already have with few modifications. > > And if we do it and implement packaging and execution environments (say > ability of choosing predefined venv to parse and execute DAGs coming from a > specific bundle_id - the expectation 2) above can be handled well. > > Could you elaborate on this part? I’m not entirely clear on how it would > work in practice. For instance, how would it behave with two teams or > bundles? Real-world examples would help clarify this, unless it's more > implementation details that we can flesh out once there's agreement on the > general approach. > Currently with Bundle definition we **just** define where the DAGs are coming from. But we could (and that was even part of the original design) add extra "execution environment" configuration. For example when we have bundle_a and bundle_b each of them could have separate "environment" specified (say env_a, env_b) and we could map such environment to specific image (image_a, image_b) or virtualenv in the same image (/venv/a/ , /venv/b) that would be predefined in the processor/worker images. (or in VMs if images are not used). The envs might have different sets of dependencies (providers and others) installed, and both DAG processor parsing and "Worker" (in celery or k8s Pod) would be run using that environment. Initially AIP-67 also discussed defining dependencies in bundle and installing it dynamically (like Python Venv Operator) - but I think personally having predefined set of environments rather than dynamically creating (like ExternalPythonOperator) has much better maintainability, stability and security properties. > Also, what about the triggerer? Since the triggerer runs user code, the > original AIP-67 proposal required at least one triggerer per team. How > would that be handled under this new architecture? > That is an excellent question :) . There are few options - depending on how much of the point 4) "isolating workload" we want to implement. Paradoxically - to be honest- for me, the Triggerer always had the potential of being less of a problem when it comes to isolation. Yes. All triggers are (currently) running not only in the same interpreter, but also in the same event loop (which means that isolation goes out of the window), but also it's relatively easy to introduce the isolation and we've been discussing options about it in the past as well. I see quite a few. Option 1) simplest operationally - We could add a mode in the Airflow that would resemble Timetables. All Triggers would have to be exposed via the plugin interface (we could easily expose all triggers this way from all our providers in a bulk way). This means that deployment manager will have control on what is run in Triggerrer - effectively limiting it similarly as Scheduler code today. That would prevent some of the other cases we discussed recently (such as sending "notification" serialized methods to triggerer to execute), but that's mostly optimization, and they could be sent as worker tasks instead in this case). Option 2) Semi-isolation - for a number of our users just separating processes might be "enough" (especially if we add cgroups to isolate the processes - we had that in the past). Not "perfect" and does not have all security properties, but for a number of our users it might be "good enough" because they will trust their teams enough to not worry about potential "malicious actions". In this case a single Triggerrer could run several event loops - one per bundle, each of them in a separate, isolated - process and the only change that we would have to do is to route the triggers to the right loop based on bundle id. Almost no operational complexity increases, but isolation is greatly improved. Again following the bundle -> environment mapping each of those processes could be run using a specific "per-bundle" environment where all necessary dependencies would be installed. And here the limit of arbitrary code execution coming from DAG can be lifted. Option 3) Full isolation -> simply run one triggerer per bundle. That is a bit more like the original proposal, because we will then have an extra triggerer for each bundle/team (or group of bundles - it does not have to be 1-to-1 mapping, could be many-to-1) . But it should provide full "security" properties with isolation and separation of workload, each triggerer could run completely in the same environment as defined in the bundle. It increases operational complexity - but just a bit. Rainbows and unicorns - we have it all. Also one more thing. We usually discuss technical aspects here in develist and rarely talk about "business". But I think this is in some cases wrong - including a multi-team that has the potential of either supporting or undermining some of the business our stakeholders do with Airflow. I would like to - really - make a collaborative effort to come up with a multi-team approach with all the stakeholders here - Amazon, Google, Astronomer especially should all be on-board with it. We know our users need it (survey and a number of talks about multi-team/tenancy that were submitted this year for Summit speak for themselves - we had ~10 sessions submitted about it this year, 30% of survey respondents want it - though of course as Ash correctly pointed out many of those people have different expectations). Again multi-team has the potential of either killing or supporting some of the business models our stakeholders might implement in their offering. And while here we do not "care" too much about those models, we should care about our stakeholders sustainability - as they are the ones who are fueling Airflow in many ways - so it would be stupid if we do not consider their expectations and needs and - yes - sustainability of their businesses. Here in the community we mostly add features that can be used by everyone - whether in "as a service" or "on-prem" environment. And we cannot "know" what the business is being planned or is possible or is good for our stakeholders. But we can collaboratively design the feature that might be usable on-prem - but one that we know is good for everyone and they can continue making business (or even better - provide better offerings to their users building on top of it. Let's do it. If there are things we can improve/make better here, I want to hear - from everyone - Ash, Vikram, Raj, Vincent, Rafał, Michał - if there is any idea how to improve it and make it better also for you - I think it's a good time to discuss it. J.