Slightly off topic/slightly related, but I came across this recently https://dev.to/stripe/designing-apis-for-humans-object-ids-3o5a
The tl;dr of it is Strip prefix all of their IDs with the type, so that they are easy to know about what type they are just by looking at the ID without needing a DB query: > The above snippet is trying to retrieve a PaymentIntent from a connected > account, however without even looking at the code you can immediately spot > the error: a Customer ID (cus_) is being used instead of an Account ID > (acct_). Without prefixes this would be much harder to debug; if Stripe used > UUIDs instead then we’d have to look up the ID (probably in the Stripe > Dashboard) to find out what kind of object it is and if it’s even valid. What do we think about using something similar? > On 10 Jun 2025, at 14:44, Vincent Beck <vincb...@apache.org> wrote: > > Oh I think you meant as an alternative solution. > > I still do not like this solution because I feel like on the long run it will > bite us. Yes on the short term we would avoid massive migration on most of > the tables and we would not need to update a lot of requests but on the long > run this will cause issues like: > - Performances. To get the list of DAGs from teamX, you would need to do > something like `WHERE dag_id like "teamX__%s"` > - Edge cases. Old DAGs with "__" in their name would cause issues. e.g. > "my__dag" would be interpreted as DAG "dag" within team "my". This is just an > example but I can feel we would need to handle many different other edge cases > - Just a natural feeling with no real datapoint that this is something that > will cause us headaches later and ends up more complicated and less > maintainable than updating the DB schema. > > But this is only my personal opinion, maybe others think otherwise :) > > Vincent > > > On 2025/06/10 13:32:49 Vincent Beck wrote: >> For backward compatibility purposes I also think the default team is a good >> idea. On the API side, if a team is not provided, then the default team is >> assigned. >> >>> For newer versions, we should probably start adding `team_id__dag_id` in >>> the dag_id column. >>> Fallback to "default" if not specified. >>> >>> For API: >>> >>> Internally resolve dag_id = team_id + "__" + original_dag_id. >>> For old DAGs, just treat dag_id = original_dag_id with team "default". >>> Replace all dagbag / related operations to split and use the >>> original_dag_id. >> >> Unless I misunderstood you are proposing to update all `dag_id` columns to >> include the team_id in it? If so, I really dont think including the >> `team_id` in the `dag_id` is a good idea, and more importantly, it is not >> needed. `dag_id` would no longer be PK and a unique constraint will be >> created on the two columns (`dag_id`, `team_id`). Why do you want to have >> the `team_id` as part of the `dag_id`? >> >> On 2025/06/10 05:45:48 Amogh Desai wrote: >>> Hi All, >>> >>> From the perspective of migrating the task instance table queries to use >>> the `ti.id` in >>> >>>> One concern I have is that if team ID is introduced and naturally we >>> want to have a dag_id uniqueness only enforced within a team (I assume >>> this is a natural consequence?) then we have a very strong break in API? >>> Because all Dag related API calls use dag_id as identifier. I would >>> dis-like to force to switch all user access to UUID as well as to force >>> to pre-fix all calls with team_id. This would be rather a v3 of the API. >>> Do we have a plan how we make the API non breaking? (Also such path's >>> are used in UI but there I'd see it not too critical if team_id is added >>> as prefix in a path) >>> >>> I concur with what Jens has to say here. It might be a very valid use case >>> to have >>> dag_id be unique per team. But that construct should be achievable with >>> unique on the >>> (dag_id, team_id). >>> >>> Just an idea I want to throw around: >>> I guess to avoid major breakage, at least for the time being, we should >>> introduce a concept >>> of "default" team. A team that belongs at the deployment level or the >>> "starting point" when AF >>> is installed. >>> >>> For newer versions, we should probably start adding `team_id__dag_id` in >>> the dag_id column. >>> Fallback to "default" if not specified. >>> >>> For API: >>> >>> Internally resolve dag_id = team_id + "__" + original_dag_id. >>> For old DAGs, just treat dag_id = original_dag_id with team "default". >>> Replace all dagbag / related operations to split and use the >>> original_dag_id. >>> >>> This will allow: >>> >>> >>> - >>> >>> Old DAGs continue to work with their unprefixed dag_id. >>> - >>> >>> New DAGs can safely use the same dag_ids but in different teams. >>> - >>> >>> API stays stable: still /dags/{dag_id}. >>> >>> >>> Thanks & Regards, >>> Amogh Desai >>> >>> >>> On Tue, Jun 10, 2025 at 3:52 AM Daniel Standish >>> <daniel.stand...@astronomer.io.invalid> wrote: >>> >>>> re >>>>> >>>>> From the point of dag_id and Dag display name (same for tasks) I am >>>>> rather requiring to keep them. Task ID and Dag ID is used in technical >>>>> terms and the display names are for humans and allow special characters. >>>> >>>> >>>> I don't really understand what the point of having a separate display >>>> name. I thought the reason we needed display name (instead of just >>>> allowing unicode in dag id) was something to do with the fact that dag id >>>> was a PK. If it's no longer PK, then that would be non-issue. Yes we'd >>>> need to figure out a path for users to migrate / deprecate. But it seems >>>> sorta pointless to have two fields when one would do. >>>> >>>> >>>> >>>> >>>> >>>> On Mon, Jun 9, 2025 at 3:19 PM Daniel Standish < >>>> daniel.stand...@astronomer.io> wrote: >>>> >>>>> An idea re backcompat. >>>>> >>>>> Can there be a default team? Then, existing API routes can stay the >>>> same, >>>>> (though maybe deprecate). But then you add new ones that take team id. >>>> Or >>>>> possibly add as a parameter, and if omitted, you get the default. >>>>> >>>>> On Mon, Jun 9, 2025 at 11:53 AM Jens Scheffler >>>> <j_scheff...@gmx.de.invalid> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> As we have not made the migration to AF3 in our environment I can not >>>>>> speak about performance impact of UUID in TI Table, but I assume even if >>>>>> then the complexity is still lower than having a compound primary key. >>>>>> >>>>>> So from DB perspective I see a very large DB migration coming as almost >>>>>> the whole DB needs to be re-written. Which is okay but need to be taken >>>>>> with care as migration will take a long downtime for large instances. >>>>>> >>>>>> From the point of dag_id and Dag display name (same for tasks) I am >>>>>> rather requiring to keep them. Task ID and Dag ID is used in technical >>>>>> terms and the display names are for humans and allow special characters. >>>>>> >>>>>> One concern I have is that if team ID is introduced and naturally we >>>>>> want to have a dag_id uniqueness only enforced within a team (I assume >>>>>> this is a natural consequence?) then we have a very strong break in API? >>>>>> Because all Dag related API calls use dag_id as identifier. I would >>>>>> dis-like to force to switch all user access to UUID as well as to force >>>>>> to pre-fix all calls with team_id. This would be rather a v3 of the API. >>>>>> Do we have a plan how we make the API non breaking? (Also such path's >>>>>> are used in UI but there I'd see it not too critical if team_id is added >>>>>> as prefix in a path) >>>>>> >>>>>> Jens >>>>>> >>>>>> On 09.06.25 18:37, Jarek Potiuk wrote: >>>>>>> I think it would be great to hear if there were any issues observed >>>>>> (with >>>>>>> either migration or performance) after we migrated task instance in >>>>>> #43161 >>>>>>> and learning from that we could decide whether to use UUID as well for >>>>>> the >>>>>>> dag table. >>>>>>> But that would be my preference to use UUID7 - similarly as we did in >>>>>> TI. >>>>>>> >>>>>>>> If we are adding a surrogate key for dag, is there any longer a >>>> reason >>>>>> to >>>>>>> have both dag_id and dag display name? >>>>>>> >>>>>>> I think the main reason is that we would have to implement merging >>>>>> dag_id >>>>>>> and display name (or rather replacing dag_id with display name) and >>>> that >>>>>>> would also require adding UUID for the task table (and replacing >>>>>>> task_display_name) for consistency. >>>>>>> >>>>>>> Also it means migration of existing dags to move "dag_display_name" >>>> and >>>>>>> "task_display_name" to be dag_id, task_id. Also if we merge these two, >>>>>> it >>>>>>> means that users will have to change their API calls to use different >>>>>> ids >>>>>>> to query their dags after rename. >>>>>>> >>>>>>> The original proposal from Vincent is transparent for DAG authors and >>>>>> API >>>>>>> as I understand it. >>>>>>> >>>>>>> I think personally, even if we would like to get rid of display_names >>>>>>> (which I am not sure of), that should be a separate migration - >>>>>> precisely >>>>>>> because of increased complexity of the migration process and impact on >>>>>> DAG >>>>>>> authors / APIs. Not impossible, but simply adds a different group of >>>>>> people >>>>>>> that should be involved in the migration and external systems that use >>>>>>> Airflow APIs - which makes the migration less likely/more risky for a >>>>>>> number of users. >>>>>>> >>>>>>> J. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Jun 9, 2025 at 6:08 PM Daniel Standish >>>>>>> <daniel.stand...@astronomer.io.invalid> wrote: >>>>>>> >>>>>>>> re >>>>>>>> >>>>>>>> * `dag`: Add `team_id` column and enforce a unique constraint on >>>>>> (`dag_id`, >>>>>>>>> `team_id`). >>>>>>>> >>>>>>>> If we are adding a surrogate key for dag, is there any longer a >>>> reason >>>>>> to >>>>>>>> have both dag_id and dag display name? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Jun 9, 2025 at 7:20 AM Beck, Vincent >>>>>> <vincb...@amazon.com.invalid> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi everyone, >>>>>>>>> >>>>>>>>> As part of the multi-team AIP effort ([AIP-67][1]), I’m planning to >>>>>> begin >>>>>>>>> work on updating the database schema to support multiple teams. >>>> Since >>>>>>>> this >>>>>>>>> is a significant and potentially disruptive change, I wanted to >>>> first >>>>>>>>> gather feedback on the proposed approach. >>>>>>>>> >>>>>>>>> ## Proposed plan >>>>>>>>> >>>>>>>>> 1. Introduce a UUID primary key on the `dag` table >>>>>>>>> >>>>>>>>> Replace the current `dag_id` primary key with a new `id` column >>>>>>>> containing >>>>>>>>> a generated UUID. This is similar to the change proposed in #43161, >>>>>> but >>>>>>>>> applied to the `dag` table. >>>>>>>>> >>>>>>>>> 1. Update all foreign keys referencing `dag.dag_id` >>>>>>>>> >>>>>>>>> Update foreign keys across all related tables to reference `dag.id` >>>>>>>>> instead of `dag.dag_id`. Impacted tables include: >>>>>>>>> >>>>>>>>> - dag_schedule_asset_alias_reference >>>>>>>>> - task_outlet_asset_reference >>>>>>>>> - dag_schedule_asset_reference >>>>>>>>> - asset_dag_run_queue >>>>>>>>> - dag_version >>>>>>>>> - dag_schedule_asset_uri_reference >>>>>>>>> - dag_tag >>>>>>>>> - dag_owner_attributes >>>>>>>>> - dag_warning >>>>>>>>> - dag_schedule_asset_name_reference >>>>>>>>> - deadline >>>>>>>>> - dag_code >>>>>>>>> - serialized_dag >>>>>>>>> - task_instance >>>>>>>>> - dag_run >>>>>>>>> - backfill >>>>>>>>> - rendered_task_instance_fields >>>>>>>>> - task_map >>>>>>>>> - xcom >>>>>>>>> - job >>>>>>>>> - log >>>>>>>>> >>>>>>>>> 1. Add `team_id` column to tables >>>>>>>>> >>>>>>>>> * `dag`: Add `team_id` column and enforce a unique constraint on >>>>>>>>> (`dag_id`, `team_id`). >>>>>>>>> * `slot_pool`: Modify the unique constraint to be on (`pool`, >>>>>> `team_id`) >>>>>>>>> instead of `pool` alone. >>>>>>>>> * `connection`: Modify the unique constraint to be on (`conn_id`, >>>>>>>>> `team_id`) instead of `conn_id` alone. >>>>>>>>> * `variable`: Modify the unique constraint to be on (`key`, >>>> `team_id`) >>>>>>>>> instead of `key` alone. >>>>>>>>> >>>>>>>>> I was also thinking adding the `team_id` column to the table >>>>>>>>> `task_instance` for optimization/simplification purposes, to make >>>>>> queries >>>>>>>>> simpler/more optimized. The scheduler makes a lot of queries on the >>>>>> task >>>>>>>>> instance level and having the `team_id` in this table would simplify >>>>>>>> them. >>>>>>>>> We can always decide when working on the implementation to add the >>>>>> column >>>>>>>>> `team_id` to other tables if we find out this would simplify things. >>>>>>>>> >>>>>>>>> Note: Some have suggested allowing variables and connections to be >>>>>> shared >>>>>>>>> across teams. Personally, I believe introducing the concept of >>>>>>>>> shared/global resources would add unnecessary complexity and >>>>>> potentially >>>>>>>>> confuse users. That said, this can be revisited later. If we decide >>>> to >>>>>>>>> support global/shared resources, we can introduce new tables to >>>>>> support >>>>>>>>> that model. >>>>>>>>> >>>>>>>>> ## Alternative Approach >>>>>>>>> >>>>>>>>> Instead of using UUIDs as primary keys, another option would be: >>>>>>>>> >>>>>>>>> * Change the primary key of `dag` to a composite key (`dag_id`, >>>>>>>> `team_id`) >>>>>>>>> * Update all foreign keys accordingly >>>>>>>>> >>>>>>>>> I’m personally not in favor of this approach, for the following >>>>>> reasons: >>>>>>>>> >>>>>>>>> * It adds complexity to nearly all queries involving the `dag` table >>>>>>>>> * It may negatively affect database performance (though I’m not a DB >>>>>>>>> expert) >>>>>>>>> * It requires specifying both `dag_id` and `team_id` to access a DAG >>>>>>>>> * We previously went down this path with `task_instance`, and >>>>>> eventually >>>>>>>>> moved to UUIDs to simplify things—this feels like a good opportunity >>>>>> to >>>>>>>>> learn from that experience >>>>>>>>> >>>>>>>>> That said, I’m happy to discuss this further if others feel >>>>>> differently. >>>>>>>>> >>>>>>>>> You can find more context and details on this topic in the >>>> multi-team >>>>>>>>> airflow project plan Google doc [2]. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Vincent >>>>>>>>> >>>>>>>>> [1] >>>>>>>>> >>>>>>>> >>>>>> >>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-67+Multi-team+deployment+of+Airflow+components >>>>>>>>> [2] >>>>>>>>> >>>>>>>> >>>>>> >>>> https://docs.google.com/document/d/11rKo5D2QpT5NvMtDR1RZDjaih5jT5H-dt0aepkfmXSE/edit?tab=t.0#heading=h.4c16fc5qa1w8 >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org >>>>>> >>>>>> >>>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >> For additional commands, e-mail: dev-h...@airflow.apache.org >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > For additional commands, e-mail: dev-h...@airflow.apache.org >