Thanks Andrei. Coordinating the two efforts before either hardens is the right call, and I want to do that.
To your direct question: I'd want Tag to stay a separated first-class REST concept (TagDefinition with namespace/name, allowed values, inheritability, CRUD lifecycle), not built on the labels track as a spec-level dependency. The point of standardizing Tag is shared classification semantics across systems, and that needs more than allowed_values, the proposal will be explicit about normative interpretation (visibility, atomicity, attachment value type) so a tag's meaning doesn't drift across catalogs. Whether a catalog persists or indexes tag assignments using the same machinery as labels (reserved-namespace pattern, dedicated reverse-lookup index, or something else) reads as a catalog implementation choice to me. I'd rather the spec leave that open than commit one shape, and the REST contract should expose tag assignments through tag-specific semantics, not through reserved-namespace labels. Governed/Standard maps to your catalog-managed vs client-writable axis. I'll align terminology where the surfaces overlap. One scope point I want to keep clean: this proposal is the classification input side. Policy enforcement stays separate, on the read-restrictions track (#13879). A sync before I write the full design doc would help. Paired proposals with a coordinated boundary is the shape I'd target. Slack works, or any morning your timezone, send a few options. -ej On Tue, May 12, 2026 at 1:38 AM Andrei Tserakhau via dev < [email protected]> wrote: > Hi EJ, > > Thanks for starting this thread. I think there is overlap with my labels > work in flight, and it would be good to converge the two efforts rather > than end up with two parallel attachment mechanisms. > > Current labels work: > > - Proposal (#15521): > > https://docs.google.com/document/d/1aj-6JlfBiMYEEVtNuh5WLMOrRQiMCcyYUGbouPM4hXI/edit > - OpenAPI PR (#15750): generic Labels primitive — flat k/v on tables, > views, > namespaces, and columns by field-id. > - CRUD follow-up: > > https://docs.google.com/document/d/1aj-6JlfBiMYEEVtNuh5WLMOrRQiMCcyYUGbouPM4hXI/edit?tab=t.emq0gkbmc7bx#heading=h.ijaa62gyvv30 > -- UpdateLabels, ETags, atomicity, catalog-managed vs client-writable keys > and SQL DDL surface. > > My read is that labels cover most of the attachment side of your proposal: > tables, columns, views, namespaces, and the use cases you listed. Where > your proposal adds something new is the management model: first-class Tag > definitions with namespace-scoped identity, allowed values, inheritance > rules, and reverse lookup. > > That part is worth designing. > > The cleanest framing I see is: > > 1. Labels = attachment mechanism. Generic k/v on tables, columns, views, > and namespaces. > 2. Tags = governance model on top. Named Tag definitions with allowed > values and management endpoints. One possible implementation is to store > tag attachments as labels under a reserved namespace, with reverse lookup > as a dedicated endpoint indexed over labels. > > Two things worth aligning early: your Governed/Standard split looks close > to > the same axis as catalog-managed vs client-writable in the CRUD follow-up, > so we should probably reconcile the terminology. Allowed values on Tag > definitions also seem like the structural answer to the interop concern > Yufei raised on both threads. > > Would you be open to building the Tag entity work on top of the labels > track? > That could be a section in the existing docs, a paired proposal, or > whatever > shape works best. > > Happy to chat on Slack or set up a quick sync before you invest in the full > design doc. > > Thanks, > Andrei > > On Tue, May 12, 2026 at 1:27 AM Yufei Gu <[email protected]> wrote: > >> Hi EJ, >> >> Thanks for sharing this. >> >> Tagging is useful as a lightweight way to categorize objects. It can help >> with common cases like basic classification, ownership or cost attribution, >> and simple discovery or filtering. That said, even with this lightweight >> framing, I’m still a bit concerned about how different catalog >> implementations and engines will interpret tags and whether we can make >> them truly interoperable. In practice, small differences in semantics or >> expectations could lead to fragmentation across catalogs. >> >> I would also be cautious about layering in governance or policy related >> semantics too early, as that may further increase the risk of inconsistent >> interpretations. >> >> Yufei >> >> >> On Mon, May 4, 2026 at 3:55 PM EJ Wang <[email protected]> >> wrote: >> >>> Hi folks, >>> >>> I'm new to the Iceberg community, currently contributing to Polaris OSS >>> on the tagging design. Before going deeper into a design doc, I want to >>> surface the direction on this list and invite early input from people with >>> more context on how IRC-level concepts get shaped here. >>> >>> Polaris users are asking for a classification primitive that covers >>> compliance (PII, sensitivity, data domain), ownership and cost attribution, >>> and AI or semantic hints on columns. My read is that we will build this >>> regardless, but designing it inside Polaris alone reduces its value. >>> Governance tools would need per-catalog adapters. If the shape is >>> standardized at the IRC level, the ecosystem benefits far more broadly. >>> >>> Across catalogs and governance platforms, the tag concept has >>> independently converged on a similar shape: a first-class Tag entity with >>> identity (name + namespace), optional schema (allowed values, >>> inheritability), and attachments to objects carrying a value. Snowflake >>> tags, Unity Catalog governed tags, Google Cloud Dataplex tag templates, >>> Apache Atlas classifications, Apache Gravitino tags, and DataHub tags all >>> expose this pattern, across ownership, FinOps, AI reasoning, and governance >>> use cases. When independent products converge, my read is that the shape is >>> the natural decomposition rather than a vendor-specific artifact. >>> >>> Two adjacent efforts are already in flight. The read-restrictions >>> proposal (apache/iceberg#13879 >>> <https://github.com/apache/iceberg/issues/13879>) delivers enforcement >>> to engines. A Tag proposal would complement it as the classification input >>> side, so catalogs can resolve tag-driven enforcement internally and deliver >>> the outcome via read-restrictions. The labels proposal ( >>> apache/iceberg#15521 <https://github.com/apache/iceberg/issues/15521>) >>> serves >>> generic catalog-managed metadata. My read is that a first-class Tag with >>> identity and lifecycle is distinct from labels; they solve different >>> problems and can coexist. >>> >>> At a high level, I think the minimum valuable scope in the IRC spec is: >>> a Tag entity with CRUD at the namespace level, tag attachments with target >>> and value applied to tables, columns via field-id, views, and namespaces, a >>> reverse lookup endpoint for "find objects with tag X", tag attachment >>> retrieval via a dedicated endpoint, and a small set of normative clauses on >>> privilege enforcement, visibility filtering, and rename atomicity. >>> Resolved tags do not need to live in LoadTableResult. >>> >>> Things I'd like to keep out of the core spec as layered extensions, not >>> first pass: typed multi-field per-attachment values (Atlas, Dataplex; >>> addable non-breaking later), a Governed-vs-Standard type distinction (Unity >>> Catalog's pattern can be expressed through configuration), and >>> tag-to-policy binding (belongs in a separate Policy authoring phase). >>> >>> What I'm asking: early feedback on whether this direction fits the IRC >>> roadmap, pointers to prior discussions I may have missed, and interest in >>> co-championing from contributors outside Polaris. I'll follow up with a >>> full design doc in the coming week. An issue placeholder is at >>> apache/iceberg#16165 <https://github.com/apache/iceberg/issues/16165> for >>> tracking. >>> >>> -ej >>> >>
