Oh I think you meant as an alternative solution.

I still do not like this solution because I feel like on the long run it will 
bite us. Yes on the short term we would avoid massive migration on most of the 
tables and we would not need to update a lot of requests but on the long run 
this will cause issues like:
- Performances. To get the list of DAGs from teamX, you would need to do 
something like `WHERE dag_id like "teamX__%s"`
- Edge cases. Old DAGs with "__" in their name would cause issues. e.g. 
"my__dag" would be interpreted as DAG "dag" within team "my". This is just an 
example but I can feel we would need to handle many different other edge cases
- Just a natural feeling with no real datapoint that this is something that 
will cause us headaches later and ends up more complicated and less 
maintainable than updating the DB schema.

But this is only my personal opinion, maybe others think otherwise :)

Vincent


On 2025/06/10 13:32:49 Vincent Beck wrote:
> For backward compatibility purposes I also think the default team is a good 
> idea. On the API side, if a team is not provided, then the default team is 
> assigned.
> 
> > For newer versions, we should probably start adding `team_id__dag_id` in
> > the dag_id column.
> > Fallback to "default" if not specified.
> > 
> > For API:
> > 
> > Internally resolve dag_id = team_id + "__" + original_dag_id.
> > For old DAGs, just treat dag_id = original_dag_id with team "default".
> > Replace all dagbag / related operations to split and use the
> > original_dag_id.
> 
> Unless I misunderstood you are proposing to update all `dag_id` columns to 
> include the team_id in it? If so, I really dont think including the `team_id` 
> in the `dag_id` is a good idea, and more importantly, it is not needed. 
> `dag_id` would no longer be PK and a unique constraint will be created on the 
> two columns (`dag_id`, `team_id`). Why do you want to have the `team_id` as 
> part of the `dag_id`?
> 
> On 2025/06/10 05:45:48 Amogh Desai wrote:
> > Hi All,
> > 
> > From the perspective of migrating the task instance table queries to use
> > the `ti.id` in
> > 
> > > One concern I have is that if team ID is introduced and naturally we
> > want to have a dag_id uniqueness only enforced within a team (I assume
> > this is a natural consequence?) then we have a very strong break in API?
> > Because all Dag related API calls use dag_id as identifier. I would
> > dis-like to force to switch all user access to UUID as well as to force
> > to pre-fix all calls with team_id. This would be rather a v3 of the API.
> > Do we have a plan how we make the API non breaking? (Also such path's
> > are used in UI but there I'd see it not too critical if team_id is added
> > as prefix in a path)
> > 
> > I concur with what Jens has to say here. It might be a very valid use case
> > to have
> > dag_id be unique per team. But that construct should be achievable with
> > unique on the
> > (dag_id, team_id).
> > 
> > Just an idea I want to throw around:
> > I guess to avoid major breakage, at least for the time being, we should
> > introduce a concept
> > of "default" team. A team that belongs at the deployment level or the
> > "starting point" when AF
> > is installed.
> > 
> > For newer versions, we should probably start adding `team_id__dag_id` in
> > the dag_id column.
> > Fallback to "default" if not specified.
> > 
> > For API:
> > 
> > Internally resolve dag_id = team_id + "__" + original_dag_id.
> > For old DAGs, just treat dag_id = original_dag_id with team "default".
> > Replace all dagbag / related operations to split and use the
> > original_dag_id.
> > 
> > This will allow:
> > 
> > 
> >    -
> > 
> >    Old DAGs continue to work with their unprefixed dag_id.
> >    -
> > 
> >    New DAGs can safely use the same dag_ids but in different teams.
> >    -
> > 
> >    API stays stable: still /dags/{dag_id}.
> > 
> > 
> > Thanks & Regards,
> > Amogh Desai
> > 
> > 
> > On Tue, Jun 10, 2025 at 3:52 AM Daniel Standish
> > <daniel.stand...@astronomer.io.invalid> wrote:
> > 
> > > re
> > > >
> > > >  From the point of dag_id and Dag display name (same for tasks) I am
> > > > rather requiring to keep them. Task ID and Dag ID is used in technical
> > > > terms and the display names are for humans and allow special characters.
> > >
> > >
> > > I don't really understand what the point of having a separate display
> > > name.  I thought the reason we needed display name (instead of just
> > > allowing unicode in dag id) was something to do with the fact that dag id
> > > was a PK.  If it's no longer PK, then that would be non-issue.  Yes we'd
> > > need to figure out a path for users to migrate / deprecate.  But it seems
> > > sorta pointless to have two fields when one would do.
> > >
> > >
> > >
> > >
> > >
> > > On Mon, Jun 9, 2025 at 3:19 PM Daniel Standish <
> > > daniel.stand...@astronomer.io> wrote:
> > >
> > > > An idea re backcompat.
> > > >
> > > > Can there be a default team?  Then, existing API routes can stay the
> > > same,
> > > > (though maybe deprecate).  But then you add new ones that take team id.
> > > Or
> > > > possibly add as a parameter, and if omitted, you get the default.
> > > >
> > > > On Mon, Jun 9, 2025 at 11:53 AM Jens Scheffler
> > > <j_scheff...@gmx.de.invalid>
> > > > wrote:
> > > >
> > > >> Hi,
> > > >>
> > > >> As we have not made the migration to AF3 in our environment I can not
> > > >> speak about performance impact of UUID in TI Table, but I assume even 
> > > >> if
> > > >> then the complexity is still lower than having a compound primary key.
> > > >>
> > > >> So from DB perspective I see a very large DB migration coming as almost
> > > >> the whole DB needs to be re-written. Which is okay but need to be taken
> > > >> with care as migration will take a long downtime for large instances.
> > > >>
> > > >>  From the point of dag_id and Dag display name (same for tasks) I am
> > > >> rather requiring to keep them. Task ID and Dag ID is used in technical
> > > >> terms and the display names are for humans and allow special 
> > > >> characters.
> > > >>
> > > >> One concern I have is that if team ID is introduced and naturally we
> > > >> want to have a dag_id uniqueness only enforced within a team (I assume
> > > >> this is a natural consequence?) then we have a very strong break in 
> > > >> API?
> > > >> Because all Dag related API calls use dag_id as identifier. I would
> > > >> dis-like to force to switch all user access to UUID as well as to force
> > > >> to pre-fix all calls with team_id. This would be rather a v3 of the 
> > > >> API.
> > > >> Do we have a plan how we make the API non breaking? (Also such path's
> > > >> are used in UI but there I'd see it not too critical if team_id is 
> > > >> added
> > > >> as prefix in a path)
> > > >>
> > > >> Jens
> > > >>
> > > >> On 09.06.25 18:37, Jarek Potiuk wrote:
> > > >> > I think it would be great to hear if there were any issues observed
> > > >> (with
> > > >> > either migration or performance) after we migrated task instance in
> > > >> #43161
> > > >> > and learning from that we could decide whether to use UUID as well 
> > > >> > for
> > > >> the
> > > >> > dag table.
> > > >> > But that would be my preference to use UUID7 - similarly as we did in
> > > >> TI.
> > > >> >
> > > >> >> If we are adding a surrogate key for dag, is there any longer a
> > > reason
> > > >> to
> > > >> > have both dag_id and dag display name?
> > > >> >
> > > >> > I think the main reason is that we would have to implement merging
> > > >> dag_id
> > > >> > and display name (or rather replacing dag_id with display name) and
> > > that
> > > >> > would  also require adding UUID for the task table (and replacing
> > > >> > task_display_name) for consistency.
> > > >> >
> > > >> > Also it means migration of existing dags to move "dag_display_name"
> > > and
> > > >> > "task_display_name" to be dag_id, task_id. Also if we merge these 
> > > >> > two,
> > > >> it
> > > >> > means that users will have to change their API calls to use different
> > > >> ids
> > > >> > to query their dags after rename.
> > > >> >
> > > >> > The original proposal from Vincent is transparent for DAG authors and
> > > >> API
> > > >> > as I understand it.
> > > >> >
> > > >> > I think personally, even if we would like to get rid of display_names
> > > >> > (which I am not sure of), that should be a separate migration -
> > > >> precisely
> > > >> > because of increased complexity of the migration process and impact 
> > > >> > on
> > > >> DAG
> > > >> > authors / APIs. Not impossible, but simply adds a different group of
> > > >> people
> > > >> > that should be involved in the migration and external systems that 
> > > >> > use
> > > >> > Airflow APIs - which makes the migration less likely/more risky for a
> > > >> > number of users.
> > > >> >
> > > >> > J.
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Mon, Jun 9, 2025 at 6:08 PM Daniel Standish
> > > >> > <daniel.stand...@astronomer.io.invalid> wrote:
> > > >> >
> > > >> >> re
> > > >> >>
> > > >> >> * `dag`: Add `team_id` column and enforce a unique constraint on
> > > >> (`dag_id`,
> > > >> >>> `team_id`).
> > > >> >>
> > > >> >> If we are adding a surrogate key for dag, is there any longer a
> > > reason
> > > >> to
> > > >> >> have both dag_id and dag display name?
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >> On Mon, Jun 9, 2025 at 7:20 AM Beck, Vincent
> > > >> <vincb...@amazon.com.invalid>
> > > >> >> wrote:
> > > >> >>
> > > >> >>> Hi everyone,
> > > >> >>>
> > > >> >>> As part of the multi-team AIP effort ([AIP-67][1]), I’m planning to
> > > >> begin
> > > >> >>> work on updating the database schema to support multiple teams.
> > > Since
> > > >> >> this
> > > >> >>> is a significant and potentially disruptive change, I wanted to
> > > first
> > > >> >>> gather feedback on the proposed approach.
> > > >> >>>
> > > >> >>> ## Proposed plan
> > > >> >>>
> > > >> >>> 1. Introduce a UUID primary key on the `dag` table
> > > >> >>>
> > > >> >>> Replace the current `dag_id` primary key with a new `id` column
> > > >> >> containing
> > > >> >>> a generated UUID. This is similar to the change proposed in #43161,
> > > >> but
> > > >> >>> applied to the `dag` table.
> > > >> >>>
> > > >> >>> 1. Update all foreign keys referencing `dag.dag_id`
> > > >> >>>
> > > >> >>> Update foreign keys across all related tables to reference `dag.id`
> > > >> >>> instead of `dag.dag_id`. Impacted tables include:
> > > >> >>>
> > > >> >>> - dag_schedule_asset_alias_reference
> > > >> >>> - task_outlet_asset_reference
> > > >> >>> - dag_schedule_asset_reference
> > > >> >>> - asset_dag_run_queue
> > > >> >>> - dag_version
> > > >> >>> - dag_schedule_asset_uri_reference
> > > >> >>> - dag_tag
> > > >> >>> - dag_owner_attributes
> > > >> >>> - dag_warning
> > > >> >>> - dag_schedule_asset_name_reference
> > > >> >>> - deadline
> > > >> >>> - dag_code
> > > >> >>> - serialized_dag
> > > >> >>> - task_instance
> > > >> >>> - dag_run
> > > >> >>> - backfill
> > > >> >>> - rendered_task_instance_fields
> > > >> >>> - task_map
> > > >> >>> - xcom
> > > >> >>> - job
> > > >> >>> - log
> > > >> >>>
> > > >> >>> 1. Add `team_id` column to tables
> > > >> >>>
> > > >> >>> * `dag`: Add `team_id` column and enforce a unique constraint on
> > > >> >>> (`dag_id`, `team_id`).
> > > >> >>> * `slot_pool`: Modify the unique constraint to be on (`pool`,
> > > >> `team_id`)
> > > >> >>> instead of `pool` alone.
> > > >> >>> * `connection`: Modify the unique constraint to be on (`conn_id`,
> > > >> >>> `team_id`) instead of `conn_id` alone.
> > > >> >>> * `variable`: Modify the unique constraint to be on (`key`,
> > > `team_id`)
> > > >> >>> instead of `key` alone.
> > > >> >>>
> > > >> >>> I was also thinking adding the `team_id` column to the table
> > > >> >>> `task_instance` for optimization/simplification purposes, to make
> > > >> queries
> > > >> >>> simpler/more optimized. The scheduler makes a lot of queries on the
> > > >> task
> > > >> >>> instance level and having the `team_id` in this table would 
> > > >> >>> simplify
> > > >> >> them.
> > > >> >>> We can always decide when working on the implementation to add the
> > > >> column
> > > >> >>> `team_id` to other tables if we find out this would simplify 
> > > >> >>> things.
> > > >> >>>
> > > >> >>> Note: Some have suggested allowing variables and connections to be
> > > >> shared
> > > >> >>> across teams. Personally, I believe introducing the concept of
> > > >> >>> shared/global resources would add unnecessary complexity and
> > > >> potentially
> > > >> >>> confuse users. That said, this can be revisited later. If we decide
> > > to
> > > >> >>> support global/shared resources, we can introduce new tables to
> > > >> support
> > > >> >>> that model.
> > > >> >>>
> > > >> >>> ## Alternative Approach
> > > >> >>>
> > > >> >>> Instead of using UUIDs as primary keys, another option would be:
> > > >> >>>
> > > >> >>> * Change the primary key of `dag` to a composite key (`dag_id`,
> > > >> >> `team_id`)
> > > >> >>> * Update all foreign keys accordingly
> > > >> >>>
> > > >> >>> I’m personally not in favor of this approach, for the following
> > > >> reasons:
> > > >> >>>
> > > >> >>> * It adds complexity to nearly all queries involving the `dag` 
> > > >> >>> table
> > > >> >>> * It may negatively affect database performance (though I’m not a 
> > > >> >>> DB
> > > >> >>> expert)
> > > >> >>> * It requires specifying both `dag_id` and `team_id` to access a 
> > > >> >>> DAG
> > > >> >>> * We previously went down this path with `task_instance`, and
> > > >> eventually
> > > >> >>> moved to UUIDs to simplify things—this feels like a good 
> > > >> >>> opportunity
> > > >> to
> > > >> >>> learn from that experience
> > > >> >>>
> > > >> >>> That said, I’m happy to discuss this further if others feel
> > > >> differently.
> > > >> >>>
> > > >> >>> You can find more context and details on this topic in the
> > > multi-team
> > > >> >>> airflow project plan Google doc [2].
> > > >> >>>
> > > >> >>> Thanks,
> > > >> >>>
> > > >> >>> Vincent
> > > >> >>>
> > > >> >>> [1]
> > > >> >>>
> > > >> >>
> > > >>
> > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-67+Multi-team+deployment+of+Airflow+components
> > > >> >>> [2]
> > > >> >>>
> > > >> >>
> > > >>
> > > https://docs.google.com/document/d/11rKo5D2QpT5NvMtDR1RZDjaih5jT5H-dt0aepkfmXSE/edit?tab=t.0#heading=h.4c16fc5qa1w8
> > > >>
> > > >> ---------------------------------------------------------------------
> > > >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > > >> For additional commands, e-mail: dev-h...@airflow.apache.org
> > > >>
> > > >>
> > >
> > 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> For additional commands, e-mail: dev-h...@airflow.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Reply via email to