For backward compatibility purposes I also think the default team is a good 
idea. On the API side, if a team is not provided, then the default team is 
assigned.

> For newer versions, we should probably start adding `team_id__dag_id` in
> the dag_id column.
> Fallback to "default" if not specified.
> 
> For API:
> 
> Internally resolve dag_id = team_id + "__" + original_dag_id.
> For old DAGs, just treat dag_id = original_dag_id with team "default".
> Replace all dagbag / related operations to split and use the
> original_dag_id.

Unless I misunderstood you are proposing to update all `dag_id` columns to 
include the team_id in it? If so, I really dont think including the `team_id` 
in the `dag_id` is a good idea, and more importantly, it is not needed. 
`dag_id` would no longer be PK and a unique constraint will be created on the 
two columns (`dag_id`, `team_id`). Why do you want to have the `team_id` as 
part of the `dag_id`?

On 2025/06/10 05:45:48 Amogh Desai wrote:
> Hi All,
> 
> From the perspective of migrating the task instance table queries to use
> the `ti.id` in
> 
> > One concern I have is that if team ID is introduced and naturally we
> want to have a dag_id uniqueness only enforced within a team (I assume
> this is a natural consequence?) then we have a very strong break in API?
> Because all Dag related API calls use dag_id as identifier. I would
> dis-like to force to switch all user access to UUID as well as to force
> to pre-fix all calls with team_id. This would be rather a v3 of the API.
> Do we have a plan how we make the API non breaking? (Also such path's
> are used in UI but there I'd see it not too critical if team_id is added
> as prefix in a path)
> 
> I concur with what Jens has to say here. It might be a very valid use case
> to have
> dag_id be unique per team. But that construct should be achievable with
> unique on the
> (dag_id, team_id).
> 
> Just an idea I want to throw around:
> I guess to avoid major breakage, at least for the time being, we should
> introduce a concept
> of "default" team. A team that belongs at the deployment level or the
> "starting point" when AF
> is installed.
> 
> For newer versions, we should probably start adding `team_id__dag_id` in
> the dag_id column.
> Fallback to "default" if not specified.
> 
> For API:
> 
> Internally resolve dag_id = team_id + "__" + original_dag_id.
> For old DAGs, just treat dag_id = original_dag_id with team "default".
> Replace all dagbag / related operations to split and use the
> original_dag_id.
> 
> This will allow:
> 
> 
>    -
> 
>    Old DAGs continue to work with their unprefixed dag_id.
>    -
> 
>    New DAGs can safely use the same dag_ids but in different teams.
>    -
> 
>    API stays stable: still /dags/{dag_id}.
> 
> 
> Thanks & Regards,
> Amogh Desai
> 
> 
> On Tue, Jun 10, 2025 at 3:52 AM Daniel Standish
> <daniel.stand...@astronomer.io.invalid> wrote:
> 
> > re
> > >
> > >  From the point of dag_id and Dag display name (same for tasks) I am
> > > rather requiring to keep them. Task ID and Dag ID is used in technical
> > > terms and the display names are for humans and allow special characters.
> >
> >
> > I don't really understand what the point of having a separate display
> > name.  I thought the reason we needed display name (instead of just
> > allowing unicode in dag id) was something to do with the fact that dag id
> > was a PK.  If it's no longer PK, then that would be non-issue.  Yes we'd
> > need to figure out a path for users to migrate / deprecate.  But it seems
> > sorta pointless to have two fields when one would do.
> >
> >
> >
> >
> >
> > On Mon, Jun 9, 2025 at 3:19 PM Daniel Standish <
> > daniel.stand...@astronomer.io> wrote:
> >
> > > An idea re backcompat.
> > >
> > > Can there be a default team?  Then, existing API routes can stay the
> > same,
> > > (though maybe deprecate).  But then you add new ones that take team id.
> > Or
> > > possibly add as a parameter, and if omitted, you get the default.
> > >
> > > On Mon, Jun 9, 2025 at 11:53 AM Jens Scheffler
> > <j_scheff...@gmx.de.invalid>
> > > wrote:
> > >
> > >> Hi,
> > >>
> > >> As we have not made the migration to AF3 in our environment I can not
> > >> speak about performance impact of UUID in TI Table, but I assume even if
> > >> then the complexity is still lower than having a compound primary key.
> > >>
> > >> So from DB perspective I see a very large DB migration coming as almost
> > >> the whole DB needs to be re-written. Which is okay but need to be taken
> > >> with care as migration will take a long downtime for large instances.
> > >>
> > >>  From the point of dag_id and Dag display name (same for tasks) I am
> > >> rather requiring to keep them. Task ID and Dag ID is used in technical
> > >> terms and the display names are for humans and allow special characters.
> > >>
> > >> One concern I have is that if team ID is introduced and naturally we
> > >> want to have a dag_id uniqueness only enforced within a team (I assume
> > >> this is a natural consequence?) then we have a very strong break in API?
> > >> Because all Dag related API calls use dag_id as identifier. I would
> > >> dis-like to force to switch all user access to UUID as well as to force
> > >> to pre-fix all calls with team_id. This would be rather a v3 of the API.
> > >> Do we have a plan how we make the API non breaking? (Also such path's
> > >> are used in UI but there I'd see it not too critical if team_id is added
> > >> as prefix in a path)
> > >>
> > >> Jens
> > >>
> > >> On 09.06.25 18:37, Jarek Potiuk wrote:
> > >> > I think it would be great to hear if there were any issues observed
> > >> (with
> > >> > either migration or performance) after we migrated task instance in
> > >> #43161
> > >> > and learning from that we could decide whether to use UUID as well for
> > >> the
> > >> > dag table.
> > >> > But that would be my preference to use UUID7 - similarly as we did in
> > >> TI.
> > >> >
> > >> >> If we are adding a surrogate key for dag, is there any longer a
> > reason
> > >> to
> > >> > have both dag_id and dag display name?
> > >> >
> > >> > I think the main reason is that we would have to implement merging
> > >> dag_id
> > >> > and display name (or rather replacing dag_id with display name) and
> > that
> > >> > would  also require adding UUID for the task table (and replacing
> > >> > task_display_name) for consistency.
> > >> >
> > >> > Also it means migration of existing dags to move "dag_display_name"
> > and
> > >> > "task_display_name" to be dag_id, task_id. Also if we merge these two,
> > >> it
> > >> > means that users will have to change their API calls to use different
> > >> ids
> > >> > to query their dags after rename.
> > >> >
> > >> > The original proposal from Vincent is transparent for DAG authors and
> > >> API
> > >> > as I understand it.
> > >> >
> > >> > I think personally, even if we would like to get rid of display_names
> > >> > (which I am not sure of), that should be a separate migration -
> > >> precisely
> > >> > because of increased complexity of the migration process and impact on
> > >> DAG
> > >> > authors / APIs. Not impossible, but simply adds a different group of
> > >> people
> > >> > that should be involved in the migration and external systems that use
> > >> > Airflow APIs - which makes the migration less likely/more risky for a
> > >> > number of users.
> > >> >
> > >> > J.
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > On Mon, Jun 9, 2025 at 6:08 PM Daniel Standish
> > >> > <daniel.stand...@astronomer.io.invalid> wrote:
> > >> >
> > >> >> re
> > >> >>
> > >> >> * `dag`: Add `team_id` column and enforce a unique constraint on
> > >> (`dag_id`,
> > >> >>> `team_id`).
> > >> >>
> > >> >> If we are adding a surrogate key for dag, is there any longer a
> > reason
> > >> to
> > >> >> have both dag_id and dag display name?
> > >> >>
> > >> >>
> > >> >>
> > >> >> On Mon, Jun 9, 2025 at 7:20 AM Beck, Vincent
> > >> <vincb...@amazon.com.invalid>
> > >> >> wrote:
> > >> >>
> > >> >>> Hi everyone,
> > >> >>>
> > >> >>> As part of the multi-team AIP effort ([AIP-67][1]), I’m planning to
> > >> begin
> > >> >>> work on updating the database schema to support multiple teams.
> > Since
> > >> >> this
> > >> >>> is a significant and potentially disruptive change, I wanted to
> > first
> > >> >>> gather feedback on the proposed approach.
> > >> >>>
> > >> >>> ## Proposed plan
> > >> >>>
> > >> >>> 1. Introduce a UUID primary key on the `dag` table
> > >> >>>
> > >> >>> Replace the current `dag_id` primary key with a new `id` column
> > >> >> containing
> > >> >>> a generated UUID. This is similar to the change proposed in #43161,
> > >> but
> > >> >>> applied to the `dag` table.
> > >> >>>
> > >> >>> 1. Update all foreign keys referencing `dag.dag_id`
> > >> >>>
> > >> >>> Update foreign keys across all related tables to reference `dag.id`
> > >> >>> instead of `dag.dag_id`. Impacted tables include:
> > >> >>>
> > >> >>> - dag_schedule_asset_alias_reference
> > >> >>> - task_outlet_asset_reference
> > >> >>> - dag_schedule_asset_reference
> > >> >>> - asset_dag_run_queue
> > >> >>> - dag_version
> > >> >>> - dag_schedule_asset_uri_reference
> > >> >>> - dag_tag
> > >> >>> - dag_owner_attributes
> > >> >>> - dag_warning
> > >> >>> - dag_schedule_asset_name_reference
> > >> >>> - deadline
> > >> >>> - dag_code
> > >> >>> - serialized_dag
> > >> >>> - task_instance
> > >> >>> - dag_run
> > >> >>> - backfill
> > >> >>> - rendered_task_instance_fields
> > >> >>> - task_map
> > >> >>> - xcom
> > >> >>> - job
> > >> >>> - log
> > >> >>>
> > >> >>> 1. Add `team_id` column to tables
> > >> >>>
> > >> >>> * `dag`: Add `team_id` column and enforce a unique constraint on
> > >> >>> (`dag_id`, `team_id`).
> > >> >>> * `slot_pool`: Modify the unique constraint to be on (`pool`,
> > >> `team_id`)
> > >> >>> instead of `pool` alone.
> > >> >>> * `connection`: Modify the unique constraint to be on (`conn_id`,
> > >> >>> `team_id`) instead of `conn_id` alone.
> > >> >>> * `variable`: Modify the unique constraint to be on (`key`,
> > `team_id`)
> > >> >>> instead of `key` alone.
> > >> >>>
> > >> >>> I was also thinking adding the `team_id` column to the table
> > >> >>> `task_instance` for optimization/simplification purposes, to make
> > >> queries
> > >> >>> simpler/more optimized. The scheduler makes a lot of queries on the
> > >> task
> > >> >>> instance level and having the `team_id` in this table would simplify
> > >> >> them.
> > >> >>> We can always decide when working on the implementation to add the
> > >> column
> > >> >>> `team_id` to other tables if we find out this would simplify things.
> > >> >>>
> > >> >>> Note: Some have suggested allowing variables and connections to be
> > >> shared
> > >> >>> across teams. Personally, I believe introducing the concept of
> > >> >>> shared/global resources would add unnecessary complexity and
> > >> potentially
> > >> >>> confuse users. That said, this can be revisited later. If we decide
> > to
> > >> >>> support global/shared resources, we can introduce new tables to
> > >> support
> > >> >>> that model.
> > >> >>>
> > >> >>> ## Alternative Approach
> > >> >>>
> > >> >>> Instead of using UUIDs as primary keys, another option would be:
> > >> >>>
> > >> >>> * Change the primary key of `dag` to a composite key (`dag_id`,
> > >> >> `team_id`)
> > >> >>> * Update all foreign keys accordingly
> > >> >>>
> > >> >>> I’m personally not in favor of this approach, for the following
> > >> reasons:
> > >> >>>
> > >> >>> * It adds complexity to nearly all queries involving the `dag` table
> > >> >>> * It may negatively affect database performance (though I’m not a DB
> > >> >>> expert)
> > >> >>> * It requires specifying both `dag_id` and `team_id` to access a DAG
> > >> >>> * We previously went down this path with `task_instance`, and
> > >> eventually
> > >> >>> moved to UUIDs to simplify things—this feels like a good opportunity
> > >> to
> > >> >>> learn from that experience
> > >> >>>
> > >> >>> That said, I’m happy to discuss this further if others feel
> > >> differently.
> > >> >>>
> > >> >>> You can find more context and details on this topic in the
> > multi-team
> > >> >>> airflow project plan Google doc [2].
> > >> >>>
> > >> >>> Thanks,
> > >> >>>
> > >> >>> Vincent
> > >> >>>
> > >> >>> [1]
> > >> >>>
> > >> >>
> > >>
> > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-67+Multi-team+deployment+of+Airflow+components
> > >> >>> [2]
> > >> >>>
> > >> >>
> > >>
> > https://docs.google.com/document/d/11rKo5D2QpT5NvMtDR1RZDjaih5jT5H-dt0aepkfmXSE/edit?tab=t.0#heading=h.4c16fc5qa1w8
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > >> For additional commands, e-mail: dev-h...@airflow.apache.org
> > >>
> > >>
> >
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Reply via email to