Re: [DISCUSS] DB changes to support multi-team

Ash Berlin-Taylor Thu, 12 Jun 2025 08:21:12 -0700

I can think of two approaches we could take here.

One would be to use plain UUIDs in the DB (which I think has some storage/query 
advtanges), but to convert it on read/write etc.


I.e. something like this:

```
In [46]: _35
Out[46]: UUID('39fba026-e956-4b69-aad7-8e03a16a2964')

In [47]: base64.urlsafe_b64encode(_35.bytes).rstrip(b"=")
Out[47]: b’OfugJulWS2mq144DoWopZA’

In [48]: f"dag_{_47.decode()}"
Out[48]: 'dag_OfugJulWS2mq144DoWopZA'
```

This (maybe? I’m really not sure) has the advantage that we can make use of 
native UUID column support in databases, and in the case of UUIDv7, we can even 
make use of the time-sortable nature of those UUIDs. This might also might make 
the DB FKs more performant (16bytes vs 26+ unicode characters, but this 
difference is likely fractional.)

The other option is to just store it as a plain string in the db (of the 
prefixed `dag_{something}` etc. form)

Both have pros and cons.

Anyone care to form an opinion on this?

I think I’m leaning slightly towards storing native UUIDs for the sorting 
property, but it is very fractional.

-ash

> On 12 Jun 2025, at 13:54, Vincent Beck <vincb...@apache.org> wrote:
> 
>> The tl;dr of it is Strip prefix all of their IDs with the type, so that they 
>> are easy to know about what type they are just by looking at the ID without 
>> needing a DB query:
> 
> I agree, this is a very nice idea. We start having many different 
> entities/resources using UUID as PK, having this prefix would help us to 
> identity which resource we are talking about.
> 
> On 2025/06/12 09:33:17 Jarek Potiuk wrote:
>>> Are per team, what really is the benefit of this approach? If I’m
>> understanding this idea right then each team would have different workers,
>> different schedulers, different API servers? So the only thing that is
>> actually shared is the DB? Is anything else shared?
>> 
>> Actually the scheduler and API server are supposed to be shared - the idea
>> is that only components that can potentially execute code coming from DAGs
>> of the team are supposed to be "per-team". See the images in
>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-67+Multi-team+deployment+of+Airflow+components,
>> They need to be slightly updated with after-airflow-3 changes, but the
>> "gray box" - which is shared, in "Proposed target architecture with
>> multi-team setup" encompasses both Task SDK and Webserver (which currently
>> is api_server).
>> 
>>> The tl;dr of it is Strip prefix all of their IDs with the type, so that
>> they are easy to know about what type they are just by looking at the ID
>> without needing a DB query:
>> 
>> Nice idea. I think we should follow it.
>> 
>> J.
>> 
>> On Thu, Jun 12, 2025 at 11:20 AM Ash Berlin-Taylor <a...@apache.org> wrote:
>> 
>>> Slightly off topic/slightly related, but I came across this recently
>>> https://dev.to/stripe/designing-apis-for-humans-object-ids-3o5a
>>> 
>>> The tl;dr of it is Strip prefix all of their IDs with the type, so that
>>> they are easy to know about what type they are just by looking at the ID
>>> without needing a DB query:
>>> 
>>>> The above snippet is trying to retrieve a PaymentIntent from a connected
>>> account, however without even looking at the code you can immediately spot
>>> the error: a Customer ID (cus_) is being used instead of an Account ID
>>> (acct_). Without prefixes this would be much harder to debug; if Stripe
>>> used UUIDs instead then we’d have to look up the ID (probably in the Stripe
>>> Dashboard) to find out what kind of object it is and if it’s even valid.
>>> 
>>> What do we think about using something similar?
>>> 
>>>> On 10 Jun 2025, at 14:44, Vincent Beck <vincb...@apache.org> wrote:
>>>> 
>>>> Oh I think you meant as an alternative solution.
>>>> 
>>>> I still do not like this solution because I feel like on the long run it
>>> will bite us. Yes on the short term we would avoid massive migration on
>>> most of the tables and we would not need to update a lot of requests but on
>>> the long run this will cause issues like:
>>>> - Performances. To get the list of DAGs from teamX, you would need to do
>>> something like `WHERE dag_id like "teamX__%s"`
>>>> - Edge cases. Old DAGs with "__" in their name would cause issues. e.g.
>>> "my__dag" would be interpreted as DAG "dag" within team "my". This is just
>>> an example but I can feel we would need to handle many different other edge
>>> cases
>>>> - Just a natural feeling with no real datapoint that this is something
>>> that will cause us headaches later and ends up more complicated and less
>>> maintainable than updating the DB schema.
>>>> 
>>>> But this is only my personal opinion, maybe others think otherwise :)
>>>> 
>>>> Vincent
>>>> 
>>>> 
>>>> On 2025/06/10 13:32:49 Vincent Beck wrote:
>>>>> For backward compatibility purposes I also think the default team is a
>>> good idea. On the API side, if a team is not provided, then the default
>>> team is assigned.
>>>>> 
>>>>>> For newer versions, we should probably start adding `team_id__dag_id`
>>> in
>>>>>> the dag_id column.
>>>>>> Fallback to "default" if not specified.
>>>>>> 
>>>>>> For API:
>>>>>> 
>>>>>> Internally resolve dag_id = team_id + "__" + original_dag_id.
>>>>>> For old DAGs, just treat dag_id = original_dag_id with team "default".
>>>>>> Replace all dagbag / related operations to split and use the
>>>>>> original_dag_id.
>>>>> 
>>>>> Unless I misunderstood you are proposing to update all `dag_id` columns
>>> to include the team_id in it? If so, I really dont think including the
>>> `team_id` in the `dag_id` is a good idea, and more importantly, it is not
>>> needed. `dag_id` would no longer be PK and a unique constraint will be
>>> created on the two columns (`dag_id`, `team_id`). Why do you want to have
>>> the `team_id` as part of the `dag_id`?
>>>>> 
>>>>> On 2025/06/10 05:45:48 Amogh Desai wrote:
>>>>>> Hi All,
>>>>>> 
>>>>>> From the perspective of migrating the task instance table queries to
>>> use
>>>>>> the `ti.id` in
>>>>>> 
>>>>>>> One concern I have is that if team ID is introduced and naturally we
>>>>>> want to have a dag_id uniqueness only enforced within a team (I assume
>>>>>> this is a natural consequence?) then we have a very strong break in
>>> API?
>>>>>> Because all Dag related API calls use dag_id as identifier. I would
>>>>>> dis-like to force to switch all user access to UUID as well as to force
>>>>>> to pre-fix all calls with team_id. This would be rather a v3 of the
>>> API.
>>>>>> Do we have a plan how we make the API non breaking? (Also such path's
>>>>>> are used in UI but there I'd see it not too critical if team_id is
>>> added
>>>>>> as prefix in a path)
>>>>>> 
>>>>>> I concur with what Jens has to say here. It might be a very valid use
>>> case
>>>>>> to have
>>>>>> dag_id be unique per team. But that construct should be achievable with
>>>>>> unique on the
>>>>>> (dag_id, team_id).
>>>>>> 
>>>>>> Just an idea I want to throw around:
>>>>>> I guess to avoid major breakage, at least for the time being, we should
>>>>>> introduce a concept
>>>>>> of "default" team. A team that belongs at the deployment level or the
>>>>>> "starting point" when AF
>>>>>> is installed.
>>>>>> 
>>>>>> For newer versions, we should probably start adding `team_id__dag_id`
>>> in
>>>>>> the dag_id column.
>>>>>> Fallback to "default" if not specified.
>>>>>> 
>>>>>> For API:
>>>>>> 
>>>>>> Internally resolve dag_id = team_id + "__" + original_dag_id.
>>>>>> For old DAGs, just treat dag_id = original_dag_id with team "default".
>>>>>> Replace all dagbag / related operations to split and use the
>>>>>> original_dag_id.
>>>>>> 
>>>>>> This will allow:
>>>>>> 
>>>>>> 
>>>>>>  -
>>>>>> 
>>>>>>  Old DAGs continue to work with their unprefixed dag_id.
>>>>>>  -
>>>>>> 
>>>>>>  New DAGs can safely use the same dag_ids but in different teams.
>>>>>>  -
>>>>>> 
>>>>>>  API stays stable: still /dags/{dag_id}.
>>>>>> 
>>>>>> 
>>>>>> Thanks & Regards,
>>>>>> Amogh Desai
>>>>>> 
>>>>>> 
>>>>>> On Tue, Jun 10, 2025 at 3:52 AM Daniel Standish
>>>>>> <daniel.stand...@astronomer.io.invalid> wrote:
>>>>>> 
>>>>>>> re
>>>>>>>> 
>>>>>>>> From the point of dag_id and Dag display name (same for tasks) I am
>>>>>>>> rather requiring to keep them. Task ID and Dag ID is used in
>>> technical
>>>>>>>> terms and the display names are for humans and allow special
>>> characters.
>>>>>>> 
>>>>>>> 
>>>>>>> I don't really understand what the point of having a separate display
>>>>>>> name.  I thought the reason we needed display name (instead of just
>>>>>>> allowing unicode in dag id) was something to do with the fact that
>>> dag id
>>>>>>> was a PK.  If it's no longer PK, then that would be non-issue.  Yes
>>> we'd
>>>>>>> need to figure out a path for users to migrate / deprecate.  But it
>>> seems
>>>>>>> sorta pointless to have two fields when one would do.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Mon, Jun 9, 2025 at 3:19 PM Daniel Standish <
>>>>>>> daniel.stand...@astronomer.io> wrote:
>>>>>>> 
>>>>>>>> An idea re backcompat.
>>>>>>>> 
>>>>>>>> Can there be a default team?  Then, existing API routes can stay the
>>>>>>> same,
>>>>>>>> (though maybe deprecate).  But then you add new ones that take team
>>> id.
>>>>>>> Or
>>>>>>>> possibly add as a parameter, and if omitted, you get the default.
>>>>>>>> 
>>>>>>>> On Mon, Jun 9, 2025 at 11:53 AM Jens Scheffler
>>>>>>> <j_scheff...@gmx.de.invalid>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> As we have not made the migration to AF3 in our environment I can
>>> not
>>>>>>>>> speak about performance impact of UUID in TI Table, but I assume
>>> even if
>>>>>>>>> then the complexity is still lower than having a compound primary
>>> key.
>>>>>>>>> 
>>>>>>>>> So from DB perspective I see a very large DB migration coming as
>>> almost
>>>>>>>>> the whole DB needs to be re-written. Which is okay but need to be
>>> taken
>>>>>>>>> with care as migration will take a long downtime for large
>>> instances.
>>>>>>>>> 
>>>>>>>>> From the point of dag_id and Dag display name (same for tasks) I am
>>>>>>>>> rather requiring to keep them. Task ID and Dag ID is used in
>>> technical
>>>>>>>>> terms and the display names are for humans and allow special
>>> characters.
>>>>>>>>> 
>>>>>>>>> One concern I have is that if team ID is introduced and naturally we
>>>>>>>>> want to have a dag_id uniqueness only enforced within a team (I
>>> assume
>>>>>>>>> this is a natural consequence?) then we have a very strong break in
>>> API?
>>>>>>>>> Because all Dag related API calls use dag_id as identifier. I would
>>>>>>>>> dis-like to force to switch all user access to UUID as well as to
>>> force
>>>>>>>>> to pre-fix all calls with team_id. This would be rather a v3 of the
>>> API.
>>>>>>>>> Do we have a plan how we make the API non breaking? (Also such
>>> path's
>>>>>>>>> are used in UI but there I'd see it not too critical if team_id is
>>> added
>>>>>>>>> as prefix in a path)
>>>>>>>>> 
>>>>>>>>> Jens
>>>>>>>>> 
>>>>>>>>> On 09.06.25 18:37, Jarek Potiuk wrote:
>>>>>>>>>> I think it would be great to hear if there were any issues observed
>>>>>>>>> (with
>>>>>>>>>> either migration or performance) after we migrated task instance in
>>>>>>>>> #43161
>>>>>>>>>> and learning from that we could decide whether to use UUID as well
>>> for
>>>>>>>>> the
>>>>>>>>>> dag table.
>>>>>>>>>> But that would be my preference to use UUID7 - similarly as we did
>>> in
>>>>>>>>> TI.
>>>>>>>>>> 
>>>>>>>>>>> If we are adding a surrogate key for dag, is there any longer a
>>>>>>> reason
>>>>>>>>> to
>>>>>>>>>> have both dag_id and dag display name?
>>>>>>>>>> 
>>>>>>>>>> I think the main reason is that we would have to implement merging
>>>>>>>>> dag_id
>>>>>>>>>> and display name (or rather replacing dag_id with display name) and
>>>>>>> that
>>>>>>>>>> would  also require adding UUID for the task table (and replacing
>>>>>>>>>> task_display_name) for consistency.
>>>>>>>>>> 
>>>>>>>>>> Also it means migration of existing dags to move "dag_display_name"
>>>>>>> and
>>>>>>>>>> "task_display_name" to be dag_id, task_id. Also if we merge these
>>> two,
>>>>>>>>> it
>>>>>>>>>> means that users will have to change their API calls to use
>>> different
>>>>>>>>> ids
>>>>>>>>>> to query their dags after rename.
>>>>>>>>>> 
>>>>>>>>>> The original proposal from Vincent is transparent for DAG authors
>>> and
>>>>>>>>> API
>>>>>>>>>> as I understand it.
>>>>>>>>>> 
>>>>>>>>>> I think personally, even if we would like to get rid of
>>> display_names
>>>>>>>>>> (which I am not sure of), that should be a separate migration -
>>>>>>>>> precisely
>>>>>>>>>> because of increased complexity of the migration process and
>>> impact on
>>>>>>>>> DAG
>>>>>>>>>> authors / APIs. Not impossible, but simply adds a different group
>>> of
>>>>>>>>> people
>>>>>>>>>> that should be involved in the migration and external systems that
>>> use
>>>>>>>>>> Airflow APIs - which makes the migration less likely/more risky
>>> for a
>>>>>>>>>> number of users.
>>>>>>>>>> 
>>>>>>>>>> J.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Mon, Jun 9, 2025 at 6:08 PM Daniel Standish
>>>>>>>>>> <daniel.stand...@astronomer.io.invalid> wrote:
>>>>>>>>>> 
>>>>>>>>>>> re
>>>>>>>>>>> 
>>>>>>>>>>> * `dag`: Add `team_id` column and enforce a unique constraint on
>>>>>>>>> (`dag_id`,
>>>>>>>>>>>> `team_id`).
>>>>>>>>>>> 
>>>>>>>>>>> If we are adding a surrogate key for dag, is there any longer a
>>>>>>> reason
>>>>>>>>> to
>>>>>>>>>>> have both dag_id and dag display name?
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Mon, Jun 9, 2025 at 7:20 AM Beck, Vincent
>>>>>>>>> <vincb...@amazon.com.invalid>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>> 
>>>>>>>>>>>> As part of the multi-team AIP effort ([AIP-67][1]), I’m planning
>>> to
>>>>>>>>> begin
>>>>>>>>>>>> work on updating the database schema to support multiple teams.
>>>>>>> Since
>>>>>>>>>>> this
>>>>>>>>>>>> is a significant and potentially disruptive change, I wanted to
>>>>>>> first
>>>>>>>>>>>> gather feedback on the proposed approach.
>>>>>>>>>>>> 
>>>>>>>>>>>> ## Proposed plan
>>>>>>>>>>>> 
>>>>>>>>>>>> 1. Introduce a UUID primary key on the `dag` table
>>>>>>>>>>>> 
>>>>>>>>>>>> Replace the current `dag_id` primary key with a new `id` column
>>>>>>>>>>> containing
>>>>>>>>>>>> a generated UUID. This is similar to the change proposed in
>>> #43161,
>>>>>>>>> but
>>>>>>>>>>>> applied to the `dag` table.
>>>>>>>>>>>> 
>>>>>>>>>>>> 1. Update all foreign keys referencing `dag.dag_id`
>>>>>>>>>>>> 
>>>>>>>>>>>> Update foreign keys across all related tables to reference `
>>> dag.id`
>>>>>>>>>>>> instead of `dag.dag_id`. Impacted tables include:
>>>>>>>>>>>> 
>>>>>>>>>>>> - dag_schedule_asset_alias_reference
>>>>>>>>>>>> - task_outlet_asset_reference
>>>>>>>>>>>> - dag_schedule_asset_reference
>>>>>>>>>>>> - asset_dag_run_queue
>>>>>>>>>>>> - dag_version
>>>>>>>>>>>> - dag_schedule_asset_uri_reference
>>>>>>>>>>>> - dag_tag
>>>>>>>>>>>> - dag_owner_attributes
>>>>>>>>>>>> - dag_warning
>>>>>>>>>>>> - dag_schedule_asset_name_reference
>>>>>>>>>>>> - deadline
>>>>>>>>>>>> - dag_code
>>>>>>>>>>>> - serialized_dag
>>>>>>>>>>>> - task_instance
>>>>>>>>>>>> - dag_run
>>>>>>>>>>>> - backfill
>>>>>>>>>>>> - rendered_task_instance_fields
>>>>>>>>>>>> - task_map
>>>>>>>>>>>> - xcom
>>>>>>>>>>>> - job
>>>>>>>>>>>> - log
>>>>>>>>>>>> 
>>>>>>>>>>>> 1. Add `team_id` column to tables
>>>>>>>>>>>> 
>>>>>>>>>>>> * `dag`: Add `team_id` column and enforce a unique constraint on
>>>>>>>>>>>> (`dag_id`, `team_id`).
>>>>>>>>>>>> * `slot_pool`: Modify the unique constraint to be on (`pool`,
>>>>>>>>> `team_id`)
>>>>>>>>>>>> instead of `pool` alone.
>>>>>>>>>>>> * `connection`: Modify the unique constraint to be on (`conn_id`,
>>>>>>>>>>>> `team_id`) instead of `conn_id` alone.
>>>>>>>>>>>> * `variable`: Modify the unique constraint to be on (`key`,
>>>>>>> `team_id`)
>>>>>>>>>>>> instead of `key` alone.
>>>>>>>>>>>> 
>>>>>>>>>>>> I was also thinking adding the `team_id` column to the table
>>>>>>>>>>>> `task_instance` for optimization/simplification purposes, to make
>>>>>>>>> queries
>>>>>>>>>>>> simpler/more optimized. The scheduler makes a lot of queries on
>>> the
>>>>>>>>> task
>>>>>>>>>>>> instance level and having the `team_id` in this table would
>>> simplify
>>>>>>>>>>> them.
>>>>>>>>>>>> We can always decide when working on the implementation to add
>>> the
>>>>>>>>> column
>>>>>>>>>>>> `team_id` to other tables if we find out this would simplify
>>> things.
>>>>>>>>>>>> 
>>>>>>>>>>>> Note: Some have suggested allowing variables and connections to
>>> be
>>>>>>>>> shared
>>>>>>>>>>>> across teams. Personally, I believe introducing the concept of
>>>>>>>>>>>> shared/global resources would add unnecessary complexity and
>>>>>>>>> potentially
>>>>>>>>>>>> confuse users. That said, this can be revisited later. If we
>>> decide
>>>>>>> to
>>>>>>>>>>>> support global/shared resources, we can introduce new tables to
>>>>>>>>> support
>>>>>>>>>>>> that model.
>>>>>>>>>>>> 
>>>>>>>>>>>> ## Alternative Approach
>>>>>>>>>>>> 
>>>>>>>>>>>> Instead of using UUIDs as primary keys, another option would be:
>>>>>>>>>>>> 
>>>>>>>>>>>> * Change the primary key of `dag` to a composite key (`dag_id`,
>>>>>>>>>>> `team_id`)
>>>>>>>>>>>> * Update all foreign keys accordingly
>>>>>>>>>>>> 
>>>>>>>>>>>> I’m personally not in favor of this approach, for the following
>>>>>>>>> reasons:
>>>>>>>>>>>> 
>>>>>>>>>>>> * It adds complexity to nearly all queries involving the `dag`
>>> table
>>>>>>>>>>>> * It may negatively affect database performance (though I’m not
>>> a DB
>>>>>>>>>>>> expert)
>>>>>>>>>>>> * It requires specifying both `dag_id` and `team_id` to access a
>>> DAG
>>>>>>>>>>>> * We previously went down this path with `task_instance`, and
>>>>>>>>> eventually
>>>>>>>>>>>> moved to UUIDs to simplify things—this feels like a good
>>> opportunity
>>>>>>>>> to
>>>>>>>>>>>> learn from that experience
>>>>>>>>>>>> 
>>>>>>>>>>>> That said, I’m happy to discuss this further if others feel
>>>>>>>>> differently.
>>>>>>>>>>>> 
>>>>>>>>>>>> You can find more context and details on this topic in the
>>>>>>> multi-team
>>>>>>>>>>>> airflow project plan Google doc [2].
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> 
>>>>>>>>>>>> Vincent
>>>>>>>>>>>> 
>>>>>>>>>>>> [1]
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-67+Multi-team+deployment+of+Airflow+components
>>>>>>>>>>>> [2]
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>> https://docs.google.com/document/d/11rKo5D2QpT5NvMtDR1RZDjaih5jT5H-dt0aepkfmXSE/edit?tab=t.0#heading=h.4c16fc5qa1w8
>>>>>>>>> 
>>>>>>>>> 
>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>>>>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org
>>>>> 
>>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>>> For additional commands, e-mail: dev-h...@airflow.apache.org
>>>> 
>>> 
>>> 
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> For additional commands, e-mail: dev-h...@airflow.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Re: [DISCUSS] DB changes to support multi-team

Reply via email to