[DISCUSS] DB changes to support multi-team

Beck, Vincent Mon, 09 Jun 2025 07:29:08 -0700

Hi everyone,

As part of the multi-team AIP effort ([AIP-67][1]), I’m planning to begin work 
on updating the database schema to support multiple teams. Since this is a 
significant and potentially disruptive change, I wanted to first gather 
feedback on the proposed approach.

## Proposed plan

1. Introduce a UUID primary key on the `dag` table

Replace the current `dag_id` primary key with a new `id` column containing a
generated UUID. This is similar to the change proposed in #43161, but applied
to the `dag` table.

1. Update all foreign keys referencing `dag.dag_id`

Update foreign keys across all related tables to reference `dag.id` instead of
`dag.dag_id`. Impacted tables include:

- dag_schedule_asset_alias_reference
- task_outlet_asset_reference
- dag_schedule_asset_reference
- asset_dag_run_queue
- dag_version
- dag_schedule_asset_uri_reference
- dag_tag
- dag_owner_attributes
- dag_warning
- dag_schedule_asset_name_reference
- deadline
- dag_code
- serialized_dag
- task_instance
- dag_run
- backfill
- rendered_task_instance_fields
- task_map
- xcom
- job
- log

1. Add `team_id` column to tables

* `dag`: Add `team_id` column and enforce a unique constraint on (`dag_id`,
`team_id`).
* `slot_pool`: Modify the unique constraint to be on (`pool`, `team_id`)
instead of `pool` alone.
* `connection`: Modify the unique constraint to be on (`conn_id`, `team_id`)
instead of `conn_id` alone.
* `variable`: Modify the unique constraint to be on (`key`, `team_id`) instead
of `key` alone.

I was also thinking adding the `team_id` column to the table `task_instance`
for optimization/simplification purposes, to make queries simpler/more
optimized. The scheduler makes a lot of queries on the task instance level and
having the `team_id` in this table would simplify them. We can always decide
when working on the implementation to add the column `team_id` to other tables
if we find out this would simplify things.

Note: Some have suggested allowing variables and connections to be shared
across teams. Personally, I believe introducing the concept of shared/global
resources would add unnecessary complexity and potentially confuse users. That
said, this can be revisited later. If we decide to support global/shared
resources, we can introduce new tables to support that model.

## Alternative Approach

Instead of using UUIDs as primary keys, another option would be:

* Change the primary key of `dag` to a composite key (`dag_id`, `team_id`)
* Update all foreign keys accordingly

I’m personally not in favor of this approach, for the following reasons:

* It adds complexity to nearly all queries involving the `dag` table
* It may negatively affect database performance (though I’m not a DB expert)
* It requires specifying both `dag_id` and `team_id` to access a DAG
* We previously went down this path with `task_instance`, and eventually moved
to UUIDs to simplify things—this feels like a good opportunity to learn from
that experience

That said, I’m happy to discuss this further if others feel differently.

You can find more context and details on this topic in the multi-team airflow
project plan Google doc [2].

Thanks,

Vincent

[1]
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-67+Multi-team+deployment+of+Airflow+components

[2]
https://docs.google.com/document/d/11rKo5D2QpT5NvMtDR1RZDjaih5jT5H-dt0aepkfmXSE/edit?tab=t.0#heading=h.4c16fc5qa1w8

[DISCUSS] DB changes to support multi-team

Reply via email to