Given all these:

> * `slot_pool`: Modify the unique constraint to be on (`pool`, `team_id`) 
> instead of `pool` alone.
> * `connection`: Modify the unique constraint to be on (`conn_id`, `team_id`) 
> instead of `conn_id` alone.
> * `variable`: Modify the unique constraint to be on (`key`, `team_id`) 
> instead of `key` alone.

Are per team, what really is the benefit of this approach? If I’m understanding 
this idea right then each team would have different workers, different 
schedulers, different API servers? So the only thing that is actually shared is 
the DB? Is anything else shared?

Have I understood this correctly?

-ash

> On 9 Jun 2025, at 15:20, Beck, Vincent <vincb...@amazon.com.INVALID> wrote:
> 
> Hi everyone,
> 
> As part of the multi-team AIP effort ([AIP-67][1]), I’m planning to begin 
> work on updating the database schema to support multiple teams. Since this is 
> a significant and potentially disruptive change, I wanted to first gather 
> feedback on the proposed approach.
> 
> ## Proposed plan
> 
> 1. Introduce a UUID primary key on the `dag` table
> 
> Replace the current `dag_id` primary key with a new `id` column containing a 
> generated UUID. This is similar to the change proposed in #43161, but applied 
> to the `dag` table.
> 
> 1. Update all foreign keys referencing `dag.dag_id`
> 
> Update foreign keys across all related tables to reference `dag.id` instead 
> of `dag.dag_id`. Impacted tables include:
> 
> - dag_schedule_asset_alias_reference
> - task_outlet_asset_reference
> - dag_schedule_asset_reference
> - asset_dag_run_queue
> - dag_version
> - dag_schedule_asset_uri_reference
> - dag_tag
> - dag_owner_attributes
> - dag_warning
> - dag_schedule_asset_name_reference
> - deadline
> - dag_code
> - serialized_dag
> - task_instance
> - dag_run
> - backfill
> - rendered_task_instance_fields
> - task_map
> - xcom
> - job
> - log
> 
> 1. Add `team_id` column to tables
> 
> * `dag`: Add `team_id` column and enforce a unique constraint on (`dag_id`, 
> `team_id`).
> * `slot_pool`: Modify the unique constraint to be on (`pool`, `team_id`) 
> instead of `pool` alone.
> * `connection`: Modify the unique constraint to be on (`conn_id`, `team_id`) 
> instead of `conn_id` alone.
> * `variable`: Modify the unique constraint to be on (`key`, `team_id`) 
> instead of `key` alone.
> 
> I was also thinking adding the `team_id` column to the table `task_instance` 
> for optimization/simplification purposes, to make queries simpler/more 
> optimized. The scheduler makes a lot of queries on the task instance level 
> and having the `team_id` in this table would simplify them. We can always 
> decide when working on the implementation to add the column `team_id` to 
> other tables if we find out this would simplify things.
> 
> Note: Some have suggested allowing variables and connections to be shared 
> across teams. Personally, I believe introducing the concept of shared/global 
> resources would add unnecessary complexity and potentially confuse users. 
> That said, this can be revisited later. If we decide to support global/shared 
> resources, we can introduce new tables to support that model.
> 
> ## Alternative Approach
> 
> Instead of using UUIDs as primary keys, another option would be:
> 
> * Change the primary key of `dag` to a composite key (`dag_id`, `team_id`)
> * Update all foreign keys accordingly
> 
> I’m personally not in favor of this approach, for the following reasons:
> 
> * It adds complexity to nearly all queries involving the `dag` table
> * It may negatively affect database performance (though I’m not a DB expert)
> * It requires specifying both `dag_id` and `team_id` to access a DAG
> * We previously went down this path with `task_instance`, and eventually 
> moved to UUIDs to simplify things—this feels like a good opportunity to learn 
> from that experience
> 
> That said, I’m happy to discuss this further if others feel differently.
> 
> You can find more context and details on this topic in the multi-team airflow 
> project plan Google doc [2].
> 
> Thanks,
> 
> Vincent
> 
> [1] 
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-67+Multi-team+deployment+of+Airflow+components
> 
> [2] 
> https://docs.google.com/document/d/11rKo5D2QpT5NvMtDR1RZDjaih5jT5H-dt0aepkfmXSE/edit?tab=t.0#heading=h.4c16fc5qa1w8


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Reply via email to