For backward compatibility purposes I also think the default team is a good idea. On the API side, if a team is not provided, then the default team is assigned.
> For newer versions, we should probably start adding `team_id__dag_id` in > the dag_id column. > Fallback to "default" if not specified. > > For API: > > Internally resolve dag_id = team_id + "__" + original_dag_id. > For old DAGs, just treat dag_id = original_dag_id with team "default". > Replace all dagbag / related operations to split and use the > original_dag_id. Unless I misunderstood you are proposing to update all `dag_id` columns to include the team_id in it? If so, I really dont think including the `team_id` in the `dag_id` is a good idea, and more importantly, it is not needed. `dag_id` would no longer be PK and a unique constraint will be created on the two columns (`dag_id`, `team_id`). Why do you want to have the `team_id` as part of the `dag_id`? On 2025/06/10 05:45:48 Amogh Desai wrote: > Hi All, > > From the perspective of migrating the task instance table queries to use > the `ti.id` in > > > One concern I have is that if team ID is introduced and naturally we > want to have a dag_id uniqueness only enforced within a team (I assume > this is a natural consequence?) then we have a very strong break in API? > Because all Dag related API calls use dag_id as identifier. I would > dis-like to force to switch all user access to UUID as well as to force > to pre-fix all calls with team_id. This would be rather a v3 of the API. > Do we have a plan how we make the API non breaking? (Also such path's > are used in UI but there I'd see it not too critical if team_id is added > as prefix in a path) > > I concur with what Jens has to say here. It might be a very valid use case > to have > dag_id be unique per team. But that construct should be achievable with > unique on the > (dag_id, team_id). > > Just an idea I want to throw around: > I guess to avoid major breakage, at least for the time being, we should > introduce a concept > of "default" team. A team that belongs at the deployment level or the > "starting point" when AF > is installed. > > For newer versions, we should probably start adding `team_id__dag_id` in > the dag_id column. > Fallback to "default" if not specified. > > For API: > > Internally resolve dag_id = team_id + "__" + original_dag_id. > For old DAGs, just treat dag_id = original_dag_id with team "default". > Replace all dagbag / related operations to split and use the > original_dag_id. > > This will allow: > > > - > > Old DAGs continue to work with their unprefixed dag_id. > - > > New DAGs can safely use the same dag_ids but in different teams. > - > > API stays stable: still /dags/{dag_id}. > > > Thanks & Regards, > Amogh Desai > > > On Tue, Jun 10, 2025 at 3:52 AM Daniel Standish > <daniel.stand...@astronomer.io.invalid> wrote: > > > re > > > > > > From the point of dag_id and Dag display name (same for tasks) I am > > > rather requiring to keep them. Task ID and Dag ID is used in technical > > > terms and the display names are for humans and allow special characters. > > > > > > I don't really understand what the point of having a separate display > > name. I thought the reason we needed display name (instead of just > > allowing unicode in dag id) was something to do with the fact that dag id > > was a PK. If it's no longer PK, then that would be non-issue. Yes we'd > > need to figure out a path for users to migrate / deprecate. But it seems > > sorta pointless to have two fields when one would do. > > > > > > > > > > > > On Mon, Jun 9, 2025 at 3:19 PM Daniel Standish < > > daniel.stand...@astronomer.io> wrote: > > > > > An idea re backcompat. > > > > > > Can there be a default team? Then, existing API routes can stay the > > same, > > > (though maybe deprecate). But then you add new ones that take team id. > > Or > > > possibly add as a parameter, and if omitted, you get the default. > > > > > > On Mon, Jun 9, 2025 at 11:53 AM Jens Scheffler > > <j_scheff...@gmx.de.invalid> > > > wrote: > > > > > >> Hi, > > >> > > >> As we have not made the migration to AF3 in our environment I can not > > >> speak about performance impact of UUID in TI Table, but I assume even if > > >> then the complexity is still lower than having a compound primary key. > > >> > > >> So from DB perspective I see a very large DB migration coming as almost > > >> the whole DB needs to be re-written. Which is okay but need to be taken > > >> with care as migration will take a long downtime for large instances. > > >> > > >> From the point of dag_id and Dag display name (same for tasks) I am > > >> rather requiring to keep them. Task ID and Dag ID is used in technical > > >> terms and the display names are for humans and allow special characters. > > >> > > >> One concern I have is that if team ID is introduced and naturally we > > >> want to have a dag_id uniqueness only enforced within a team (I assume > > >> this is a natural consequence?) then we have a very strong break in API? > > >> Because all Dag related API calls use dag_id as identifier. I would > > >> dis-like to force to switch all user access to UUID as well as to force > > >> to pre-fix all calls with team_id. This would be rather a v3 of the API. > > >> Do we have a plan how we make the API non breaking? (Also such path's > > >> are used in UI but there I'd see it not too critical if team_id is added > > >> as prefix in a path) > > >> > > >> Jens > > >> > > >> On 09.06.25 18:37, Jarek Potiuk wrote: > > >> > I think it would be great to hear if there were any issues observed > > >> (with > > >> > either migration or performance) after we migrated task instance in > > >> #43161 > > >> > and learning from that we could decide whether to use UUID as well for > > >> the > > >> > dag table. > > >> > But that would be my preference to use UUID7 - similarly as we did in > > >> TI. > > >> > > > >> >> If we are adding a surrogate key for dag, is there any longer a > > reason > > >> to > > >> > have both dag_id and dag display name? > > >> > > > >> > I think the main reason is that we would have to implement merging > > >> dag_id > > >> > and display name (or rather replacing dag_id with display name) and > > that > > >> > would also require adding UUID for the task table (and replacing > > >> > task_display_name) for consistency. > > >> > > > >> > Also it means migration of existing dags to move "dag_display_name" > > and > > >> > "task_display_name" to be dag_id, task_id. Also if we merge these two, > > >> it > > >> > means that users will have to change their API calls to use different > > >> ids > > >> > to query their dags after rename. > > >> > > > >> > The original proposal from Vincent is transparent for DAG authors and > > >> API > > >> > as I understand it. > > >> > > > >> > I think personally, even if we would like to get rid of display_names > > >> > (which I am not sure of), that should be a separate migration - > > >> precisely > > >> > because of increased complexity of the migration process and impact on > > >> DAG > > >> > authors / APIs. Not impossible, but simply adds a different group of > > >> people > > >> > that should be involved in the migration and external systems that use > > >> > Airflow APIs - which makes the migration less likely/more risky for a > > >> > number of users. > > >> > > > >> > J. > > >> > > > >> > > > >> > > > >> > > > >> > On Mon, Jun 9, 2025 at 6:08 PM Daniel Standish > > >> > <daniel.stand...@astronomer.io.invalid> wrote: > > >> > > > >> >> re > > >> >> > > >> >> * `dag`: Add `team_id` column and enforce a unique constraint on > > >> (`dag_id`, > > >> >>> `team_id`). > > >> >> > > >> >> If we are adding a surrogate key for dag, is there any longer a > > reason > > >> to > > >> >> have both dag_id and dag display name? > > >> >> > > >> >> > > >> >> > > >> >> On Mon, Jun 9, 2025 at 7:20 AM Beck, Vincent > > >> <vincb...@amazon.com.invalid> > > >> >> wrote: > > >> >> > > >> >>> Hi everyone, > > >> >>> > > >> >>> As part of the multi-team AIP effort ([AIP-67][1]), I’m planning to > > >> begin > > >> >>> work on updating the database schema to support multiple teams. > > Since > > >> >> this > > >> >>> is a significant and potentially disruptive change, I wanted to > > first > > >> >>> gather feedback on the proposed approach. > > >> >>> > > >> >>> ## Proposed plan > > >> >>> > > >> >>> 1. Introduce a UUID primary key on the `dag` table > > >> >>> > > >> >>> Replace the current `dag_id` primary key with a new `id` column > > >> >> containing > > >> >>> a generated UUID. This is similar to the change proposed in #43161, > > >> but > > >> >>> applied to the `dag` table. > > >> >>> > > >> >>> 1. Update all foreign keys referencing `dag.dag_id` > > >> >>> > > >> >>> Update foreign keys across all related tables to reference `dag.id` > > >> >>> instead of `dag.dag_id`. Impacted tables include: > > >> >>> > > >> >>> - dag_schedule_asset_alias_reference > > >> >>> - task_outlet_asset_reference > > >> >>> - dag_schedule_asset_reference > > >> >>> - asset_dag_run_queue > > >> >>> - dag_version > > >> >>> - dag_schedule_asset_uri_reference > > >> >>> - dag_tag > > >> >>> - dag_owner_attributes > > >> >>> - dag_warning > > >> >>> - dag_schedule_asset_name_reference > > >> >>> - deadline > > >> >>> - dag_code > > >> >>> - serialized_dag > > >> >>> - task_instance > > >> >>> - dag_run > > >> >>> - backfill > > >> >>> - rendered_task_instance_fields > > >> >>> - task_map > > >> >>> - xcom > > >> >>> - job > > >> >>> - log > > >> >>> > > >> >>> 1. Add `team_id` column to tables > > >> >>> > > >> >>> * `dag`: Add `team_id` column and enforce a unique constraint on > > >> >>> (`dag_id`, `team_id`). > > >> >>> * `slot_pool`: Modify the unique constraint to be on (`pool`, > > >> `team_id`) > > >> >>> instead of `pool` alone. > > >> >>> * `connection`: Modify the unique constraint to be on (`conn_id`, > > >> >>> `team_id`) instead of `conn_id` alone. > > >> >>> * `variable`: Modify the unique constraint to be on (`key`, > > `team_id`) > > >> >>> instead of `key` alone. > > >> >>> > > >> >>> I was also thinking adding the `team_id` column to the table > > >> >>> `task_instance` for optimization/simplification purposes, to make > > >> queries > > >> >>> simpler/more optimized. The scheduler makes a lot of queries on the > > >> task > > >> >>> instance level and having the `team_id` in this table would simplify > > >> >> them. > > >> >>> We can always decide when working on the implementation to add the > > >> column > > >> >>> `team_id` to other tables if we find out this would simplify things. > > >> >>> > > >> >>> Note: Some have suggested allowing variables and connections to be > > >> shared > > >> >>> across teams. Personally, I believe introducing the concept of > > >> >>> shared/global resources would add unnecessary complexity and > > >> potentially > > >> >>> confuse users. That said, this can be revisited later. If we decide > > to > > >> >>> support global/shared resources, we can introduce new tables to > > >> support > > >> >>> that model. > > >> >>> > > >> >>> ## Alternative Approach > > >> >>> > > >> >>> Instead of using UUIDs as primary keys, another option would be: > > >> >>> > > >> >>> * Change the primary key of `dag` to a composite key (`dag_id`, > > >> >> `team_id`) > > >> >>> * Update all foreign keys accordingly > > >> >>> > > >> >>> I’m personally not in favor of this approach, for the following > > >> reasons: > > >> >>> > > >> >>> * It adds complexity to nearly all queries involving the `dag` table > > >> >>> * It may negatively affect database performance (though I’m not a DB > > >> >>> expert) > > >> >>> * It requires specifying both `dag_id` and `team_id` to access a DAG > > >> >>> * We previously went down this path with `task_instance`, and > > >> eventually > > >> >>> moved to UUIDs to simplify things—this feels like a good opportunity > > >> to > > >> >>> learn from that experience > > >> >>> > > >> >>> That said, I’m happy to discuss this further if others feel > > >> differently. > > >> >>> > > >> >>> You can find more context and details on this topic in the > > multi-team > > >> >>> airflow project plan Google doc [2]. > > >> >>> > > >> >>> Thanks, > > >> >>> > > >> >>> Vincent > > >> >>> > > >> >>> [1] > > >> >>> > > >> >> > > >> > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-67+Multi-team+deployment+of+Airflow+components > > >> >>> [2] > > >> >>> > > >> >> > > >> > > https://docs.google.com/document/d/11rKo5D2QpT5NvMtDR1RZDjaih5jT5H-dt0aepkfmXSE/edit?tab=t.0#heading=h.4c16fc5qa1w8 > > >> > > >> --------------------------------------------------------------------- > > >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > >> For additional commands, e-mail: dev-h...@airflow.apache.org > > >> > > >> > > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org For additional commands, e-mail: dev-h...@airflow.apache.org