> The tl;dr of it is Strip prefix all of their IDs with the type, so that they > are easy to know about what type they are just by looking at the ID without > needing a DB query:
I agree, this is a very nice idea. We start having many different entities/resources using UUID as PK, having this prefix would help us to identity which resource we are talking about. On 2025/06/12 09:33:17 Jarek Potiuk wrote: > > Are per team, what really is the benefit of this approach? If I’m > understanding this idea right then each team would have different workers, > different schedulers, different API servers? So the only thing that is > actually shared is the DB? Is anything else shared? > > Actually the scheduler and API server are supposed to be shared - the idea > is that only components that can potentially execute code coming from DAGs > of the team are supposed to be "per-team". See the images in > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-67+Multi-team+deployment+of+Airflow+components, > They need to be slightly updated with after-airflow-3 changes, but the > "gray box" - which is shared, in "Proposed target architecture with > multi-team setup" encompasses both Task SDK and Webserver (which currently > is api_server). > > > The tl;dr of it is Strip prefix all of their IDs with the type, so that > they are easy to know about what type they are just by looking at the ID > without needing a DB query: > > Nice idea. I think we should follow it. > > J. > > On Thu, Jun 12, 2025 at 11:20 AM Ash Berlin-Taylor <a...@apache.org> wrote: > > > Slightly off topic/slightly related, but I came across this recently > > https://dev.to/stripe/designing-apis-for-humans-object-ids-3o5a > > > > The tl;dr of it is Strip prefix all of their IDs with the type, so that > > they are easy to know about what type they are just by looking at the ID > > without needing a DB query: > > > > > The above snippet is trying to retrieve a PaymentIntent from a connected > > account, however without even looking at the code you can immediately spot > > the error: a Customer ID (cus_) is being used instead of an Account ID > > (acct_). Without prefixes this would be much harder to debug; if Stripe > > used UUIDs instead then we’d have to look up the ID (probably in the Stripe > > Dashboard) to find out what kind of object it is and if it’s even valid. > > > > What do we think about using something similar? > > > > > On 10 Jun 2025, at 14:44, Vincent Beck <vincb...@apache.org> wrote: > > > > > > Oh I think you meant as an alternative solution. > > > > > > I still do not like this solution because I feel like on the long run it > > will bite us. Yes on the short term we would avoid massive migration on > > most of the tables and we would not need to update a lot of requests but on > > the long run this will cause issues like: > > > - Performances. To get the list of DAGs from teamX, you would need to do > > something like `WHERE dag_id like "teamX__%s"` > > > - Edge cases. Old DAGs with "__" in their name would cause issues. e.g. > > "my__dag" would be interpreted as DAG "dag" within team "my". This is just > > an example but I can feel we would need to handle many different other edge > > cases > > > - Just a natural feeling with no real datapoint that this is something > > that will cause us headaches later and ends up more complicated and less > > maintainable than updating the DB schema. > > > > > > But this is only my personal opinion, maybe others think otherwise :) > > > > > > Vincent > > > > > > > > > On 2025/06/10 13:32:49 Vincent Beck wrote: > > >> For backward compatibility purposes I also think the default team is a > > good idea. On the API side, if a team is not provided, then the default > > team is assigned. > > >> > > >>> For newer versions, we should probably start adding `team_id__dag_id` > > in > > >>> the dag_id column. > > >>> Fallback to "default" if not specified. > > >>> > > >>> For API: > > >>> > > >>> Internally resolve dag_id = team_id + "__" + original_dag_id. > > >>> For old DAGs, just treat dag_id = original_dag_id with team "default". > > >>> Replace all dagbag / related operations to split and use the > > >>> original_dag_id. > > >> > > >> Unless I misunderstood you are proposing to update all `dag_id` columns > > to include the team_id in it? If so, I really dont think including the > > `team_id` in the `dag_id` is a good idea, and more importantly, it is not > > needed. `dag_id` would no longer be PK and a unique constraint will be > > created on the two columns (`dag_id`, `team_id`). Why do you want to have > > the `team_id` as part of the `dag_id`? > > >> > > >> On 2025/06/10 05:45:48 Amogh Desai wrote: > > >>> Hi All, > > >>> > > >>> From the perspective of migrating the task instance table queries to > > use > > >>> the `ti.id` in > > >>> > > >>>> One concern I have is that if team ID is introduced and naturally we > > >>> want to have a dag_id uniqueness only enforced within a team (I assume > > >>> this is a natural consequence?) then we have a very strong break in > > API? > > >>> Because all Dag related API calls use dag_id as identifier. I would > > >>> dis-like to force to switch all user access to UUID as well as to force > > >>> to pre-fix all calls with team_id. This would be rather a v3 of the > > API. > > >>> Do we have a plan how we make the API non breaking? (Also such path's > > >>> are used in UI but there I'd see it not too critical if team_id is > > added > > >>> as prefix in a path) > > >>> > > >>> I concur with what Jens has to say here. It might be a very valid use > > case > > >>> to have > > >>> dag_id be unique per team. But that construct should be achievable with > > >>> unique on the > > >>> (dag_id, team_id). > > >>> > > >>> Just an idea I want to throw around: > > >>> I guess to avoid major breakage, at least for the time being, we should > > >>> introduce a concept > > >>> of "default" team. A team that belongs at the deployment level or the > > >>> "starting point" when AF > > >>> is installed. > > >>> > > >>> For newer versions, we should probably start adding `team_id__dag_id` > > in > > >>> the dag_id column. > > >>> Fallback to "default" if not specified. > > >>> > > >>> For API: > > >>> > > >>> Internally resolve dag_id = team_id + "__" + original_dag_id. > > >>> For old DAGs, just treat dag_id = original_dag_id with team "default". > > >>> Replace all dagbag / related operations to split and use the > > >>> original_dag_id. > > >>> > > >>> This will allow: > > >>> > > >>> > > >>> - > > >>> > > >>> Old DAGs continue to work with their unprefixed dag_id. > > >>> - > > >>> > > >>> New DAGs can safely use the same dag_ids but in different teams. > > >>> - > > >>> > > >>> API stays stable: still /dags/{dag_id}. > > >>> > > >>> > > >>> Thanks & Regards, > > >>> Amogh Desai > > >>> > > >>> > > >>> On Tue, Jun 10, 2025 at 3:52 AM Daniel Standish > > >>> <daniel.stand...@astronomer.io.invalid> wrote: > > >>> > > >>>> re > > >>>>> > > >>>>> From the point of dag_id and Dag display name (same for tasks) I am > > >>>>> rather requiring to keep them. Task ID and Dag ID is used in > > technical > > >>>>> terms and the display names are for humans and allow special > > characters. > > >>>> > > >>>> > > >>>> I don't really understand what the point of having a separate display > > >>>> name. I thought the reason we needed display name (instead of just > > >>>> allowing unicode in dag id) was something to do with the fact that > > dag id > > >>>> was a PK. If it's no longer PK, then that would be non-issue. Yes > > we'd > > >>>> need to figure out a path for users to migrate / deprecate. But it > > seems > > >>>> sorta pointless to have two fields when one would do. > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> On Mon, Jun 9, 2025 at 3:19 PM Daniel Standish < > > >>>> daniel.stand...@astronomer.io> wrote: > > >>>> > > >>>>> An idea re backcompat. > > >>>>> > > >>>>> Can there be a default team? Then, existing API routes can stay the > > >>>> same, > > >>>>> (though maybe deprecate). But then you add new ones that take team > > id. > > >>>> Or > > >>>>> possibly add as a parameter, and if omitted, you get the default. > > >>>>> > > >>>>> On Mon, Jun 9, 2025 at 11:53 AM Jens Scheffler > > >>>> <j_scheff...@gmx.de.invalid> > > >>>>> wrote: > > >>>>> > > >>>>>> Hi, > > >>>>>> > > >>>>>> As we have not made the migration to AF3 in our environment I can > > not > > >>>>>> speak about performance impact of UUID in TI Table, but I assume > > even if > > >>>>>> then the complexity is still lower than having a compound primary > > key. > > >>>>>> > > >>>>>> So from DB perspective I see a very large DB migration coming as > > almost > > >>>>>> the whole DB needs to be re-written. Which is okay but need to be > > taken > > >>>>>> with care as migration will take a long downtime for large > > instances. > > >>>>>> > > >>>>>> From the point of dag_id and Dag display name (same for tasks) I am > > >>>>>> rather requiring to keep them. Task ID and Dag ID is used in > > technical > > >>>>>> terms and the display names are for humans and allow special > > characters. > > >>>>>> > > >>>>>> One concern I have is that if team ID is introduced and naturally we > > >>>>>> want to have a dag_id uniqueness only enforced within a team (I > > assume > > >>>>>> this is a natural consequence?) then we have a very strong break in > > API? > > >>>>>> Because all Dag related API calls use dag_id as identifier. I would > > >>>>>> dis-like to force to switch all user access to UUID as well as to > > force > > >>>>>> to pre-fix all calls with team_id. This would be rather a v3 of the > > API. > > >>>>>> Do we have a plan how we make the API non breaking? (Also such > > path's > > >>>>>> are used in UI but there I'd see it not too critical if team_id is > > added > > >>>>>> as prefix in a path) > > >>>>>> > > >>>>>> Jens > > >>>>>> > > >>>>>> On 09.06.25 18:37, Jarek Potiuk wrote: > > >>>>>>> I think it would be great to hear if there were any issues observed > > >>>>>> (with > > >>>>>>> either migration or performance) after we migrated task instance in > > >>>>>> #43161 > > >>>>>>> and learning from that we could decide whether to use UUID as well > > for > > >>>>>> the > > >>>>>>> dag table. > > >>>>>>> But that would be my preference to use UUID7 - similarly as we did > > in > > >>>>>> TI. > > >>>>>>> > > >>>>>>>> If we are adding a surrogate key for dag, is there any longer a > > >>>> reason > > >>>>>> to > > >>>>>>> have both dag_id and dag display name? > > >>>>>>> > > >>>>>>> I think the main reason is that we would have to implement merging > > >>>>>> dag_id > > >>>>>>> and display name (or rather replacing dag_id with display name) and > > >>>> that > > >>>>>>> would also require adding UUID for the task table (and replacing > > >>>>>>> task_display_name) for consistency. > > >>>>>>> > > >>>>>>> Also it means migration of existing dags to move "dag_display_name" > > >>>> and > > >>>>>>> "task_display_name" to be dag_id, task_id. Also if we merge these > > two, > > >>>>>> it > > >>>>>>> means that users will have to change their API calls to use > > different > > >>>>>> ids > > >>>>>>> to query their dags after rename. > > >>>>>>> > > >>>>>>> The original proposal from Vincent is transparent for DAG authors > > and > > >>>>>> API > > >>>>>>> as I understand it. > > >>>>>>> > > >>>>>>> I think personally, even if we would like to get rid of > > display_names > > >>>>>>> (which I am not sure of), that should be a separate migration - > > >>>>>> precisely > > >>>>>>> because of increased complexity of the migration process and > > impact on > > >>>>>> DAG > > >>>>>>> authors / APIs. Not impossible, but simply adds a different group > > of > > >>>>>> people > > >>>>>>> that should be involved in the migration and external systems that > > use > > >>>>>>> Airflow APIs - which makes the migration less likely/more risky > > for a > > >>>>>>> number of users. > > >>>>>>> > > >>>>>>> J. > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> On Mon, Jun 9, 2025 at 6:08 PM Daniel Standish > > >>>>>>> <daniel.stand...@astronomer.io.invalid> wrote: > > >>>>>>> > > >>>>>>>> re > > >>>>>>>> > > >>>>>>>> * `dag`: Add `team_id` column and enforce a unique constraint on > > >>>>>> (`dag_id`, > > >>>>>>>>> `team_id`). > > >>>>>>>> > > >>>>>>>> If we are adding a surrogate key for dag, is there any longer a > > >>>> reason > > >>>>>> to > > >>>>>>>> have both dag_id and dag display name? > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> On Mon, Jun 9, 2025 at 7:20 AM Beck, Vincent > > >>>>>> <vincb...@amazon.com.invalid> > > >>>>>>>> wrote: > > >>>>>>>> > > >>>>>>>>> Hi everyone, > > >>>>>>>>> > > >>>>>>>>> As part of the multi-team AIP effort ([AIP-67][1]), I’m planning > > to > > >>>>>> begin > > >>>>>>>>> work on updating the database schema to support multiple teams. > > >>>> Since > > >>>>>>>> this > > >>>>>>>>> is a significant and potentially disruptive change, I wanted to > > >>>> first > > >>>>>>>>> gather feedback on the proposed approach. > > >>>>>>>>> > > >>>>>>>>> ## Proposed plan > > >>>>>>>>> > > >>>>>>>>> 1. Introduce a UUID primary key on the `dag` table > > >>>>>>>>> > > >>>>>>>>> Replace the current `dag_id` primary key with a new `id` column > > >>>>>>>> containing > > >>>>>>>>> a generated UUID. This is similar to the change proposed in > > #43161, > > >>>>>> but > > >>>>>>>>> applied to the `dag` table. > > >>>>>>>>> > > >>>>>>>>> 1. Update all foreign keys referencing `dag.dag_id` > > >>>>>>>>> > > >>>>>>>>> Update foreign keys across all related tables to reference ` > > dag.id` > > >>>>>>>>> instead of `dag.dag_id`. Impacted tables include: > > >>>>>>>>> > > >>>>>>>>> - dag_schedule_asset_alias_reference > > >>>>>>>>> - task_outlet_asset_reference > > >>>>>>>>> - dag_schedule_asset_reference > > >>>>>>>>> - asset_dag_run_queue > > >>>>>>>>> - dag_version > > >>>>>>>>> - dag_schedule_asset_uri_reference > > >>>>>>>>> - dag_tag > > >>>>>>>>> - dag_owner_attributes > > >>>>>>>>> - dag_warning > > >>>>>>>>> - dag_schedule_asset_name_reference > > >>>>>>>>> - deadline > > >>>>>>>>> - dag_code > > >>>>>>>>> - serialized_dag > > >>>>>>>>> - task_instance > > >>>>>>>>> - dag_run > > >>>>>>>>> - backfill > > >>>>>>>>> - rendered_task_instance_fields > > >>>>>>>>> - task_map > > >>>>>>>>> - xcom > > >>>>>>>>> - job > > >>>>>>>>> - log > > >>>>>>>>> > > >>>>>>>>> 1. Add `team_id` column to tables > > >>>>>>>>> > > >>>>>>>>> * `dag`: Add `team_id` column and enforce a unique constraint on > > >>>>>>>>> (`dag_id`, `team_id`). > > >>>>>>>>> * `slot_pool`: Modify the unique constraint to be on (`pool`, > > >>>>>> `team_id`) > > >>>>>>>>> instead of `pool` alone. > > >>>>>>>>> * `connection`: Modify the unique constraint to be on (`conn_id`, > > >>>>>>>>> `team_id`) instead of `conn_id` alone. > > >>>>>>>>> * `variable`: Modify the unique constraint to be on (`key`, > > >>>> `team_id`) > > >>>>>>>>> instead of `key` alone. > > >>>>>>>>> > > >>>>>>>>> I was also thinking adding the `team_id` column to the table > > >>>>>>>>> `task_instance` for optimization/simplification purposes, to make > > >>>>>> queries > > >>>>>>>>> simpler/more optimized. The scheduler makes a lot of queries on > > the > > >>>>>> task > > >>>>>>>>> instance level and having the `team_id` in this table would > > simplify > > >>>>>>>> them. > > >>>>>>>>> We can always decide when working on the implementation to add > > the > > >>>>>> column > > >>>>>>>>> `team_id` to other tables if we find out this would simplify > > things. > > >>>>>>>>> > > >>>>>>>>> Note: Some have suggested allowing variables and connections to > > be > > >>>>>> shared > > >>>>>>>>> across teams. Personally, I believe introducing the concept of > > >>>>>>>>> shared/global resources would add unnecessary complexity and > > >>>>>> potentially > > >>>>>>>>> confuse users. That said, this can be revisited later. If we > > decide > > >>>> to > > >>>>>>>>> support global/shared resources, we can introduce new tables to > > >>>>>> support > > >>>>>>>>> that model. > > >>>>>>>>> > > >>>>>>>>> ## Alternative Approach > > >>>>>>>>> > > >>>>>>>>> Instead of using UUIDs as primary keys, another option would be: > > >>>>>>>>> > > >>>>>>>>> * Change the primary key of `dag` to a composite key (`dag_id`, > > >>>>>>>> `team_id`) > > >>>>>>>>> * Update all foreign keys accordingly > > >>>>>>>>> > > >>>>>>>>> I’m personally not in favor of this approach, for the following > > >>>>>> reasons: > > >>>>>>>>> > > >>>>>>>>> * It adds complexity to nearly all queries involving the `dag` > > table > > >>>>>>>>> * It may negatively affect database performance (though I’m not > > a DB > > >>>>>>>>> expert) > > >>>>>>>>> * It requires specifying both `dag_id` and `team_id` to access a > > DAG > > >>>>>>>>> * We previously went down this path with `task_instance`, and > > >>>>>> eventually > > >>>>>>>>> moved to UUIDs to simplify things—this feels like a good > > opportunity > > >>>>>> to > > >>>>>>>>> learn from that experience > > >>>>>>>>> > > >>>>>>>>> That said, I’m happy to discuss this further if others feel > > >>>>>> differently. > > >>>>>>>>> > > >>>>>>>>> You can find more context and details on this topic in the > > >>>> multi-team > > >>>>>>>>> airflow project plan Google doc [2]. > > >>>>>>>>> > > >>>>>>>>> Thanks, > > >>>>>>>>> > > >>>>>>>>> Vincent > > >>>>>>>>> > > >>>>>>>>> [1] > > >>>>>>>>> > > >>>>>>>> > > >>>>>> > > >>>> > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-67+Multi-team+deployment+of+Airflow+components > > >>>>>>>>> [2] > > >>>>>>>>> > > >>>>>>>> > > >>>>>> > > >>>> > > https://docs.google.com/document/d/11rKo5D2QpT5NvMtDR1RZDjaih5jT5H-dt0aepkfmXSE/edit?tab=t.0#heading=h.4c16fc5qa1w8 > > >>>>>> > > >>>>>> > > --------------------------------------------------------------------- > > >>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > >>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org > > >>>>>> > > >>>>>> > > >>>> > > >>> > > >> > > >> --------------------------------------------------------------------- > > >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > >> For additional commands, e-mail: dev-h...@airflow.apache.org > > >> > > >> > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > > For additional commands, e-mail: dev-h...@airflow.apache.org > > > > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org For additional commands, e-mail: dev-h...@airflow.apache.org