Hi EJ,

Thanks for starting this thread. I think there is overlap with my labels
work in flight, and it would be good to converge the two efforts rather
than end up with two parallel attachment mechanisms.

Current labels work:

- Proposal (#15521):

https://docs.google.com/document/d/1aj-6JlfBiMYEEVtNuh5WLMOrRQiMCcyYUGbouPM4hXI/edit
- OpenAPI PR (#15750): generic Labels primitive — flat k/v on tables, views,
  namespaces, and columns by field-id.
- CRUD follow-up:

https://docs.google.com/document/d/1aj-6JlfBiMYEEVtNuh5WLMOrRQiMCcyYUGbouPM4hXI/edit?tab=t.emq0gkbmc7bx#heading=h.ijaa62gyvv30
-- UpdateLabels, ETags, atomicity, catalog-managed vs client-writable keys
and SQL DDL surface.

My read is that labels cover most of the attachment side of your proposal:
tables, columns, views, namespaces, and the use cases you listed. Where
your proposal adds something new is the management model: first-class Tag
definitions with namespace-scoped identity, allowed values, inheritance
rules, and reverse lookup.

That part is worth designing.

The cleanest framing I see is:

1. Labels = attachment mechanism. Generic k/v on tables, columns, views,
and namespaces.
2. Tags = governance model on top. Named Tag definitions with allowed
values and management endpoints. One possible implementation is to store
tag attachments as labels under a reserved namespace, with reverse lookup
as a dedicated endpoint indexed over labels.

Two things worth aligning early: your Governed/Standard split looks close to
the same axis as catalog-managed vs client-writable in the CRUD follow-up,
so we should probably reconcile the terminology. Allowed values on Tag
definitions also seem like the structural answer to the interop concern
Yufei raised on both threads.

Would you be open to building the Tag entity work on top of the labels
track?
That could be a section in the existing docs, a paired proposal, or whatever
shape works best.

Happy to chat on Slack or set up a quick sync before you invest in the full
design doc.

Thanks,
Andrei

On Tue, May 12, 2026 at 1:27 AM Yufei Gu <[email protected]> wrote:

> Hi EJ,
>
> Thanks for sharing this.
>
> Tagging is useful as a lightweight way to categorize objects. It can help
> with common cases like basic classification, ownership or cost attribution,
> and simple discovery or filtering. That said, even with this lightweight
> framing, I’m still a bit concerned about how different catalog
> implementations and engines will interpret tags and whether we can make
> them truly interoperable. In practice, small differences in semantics or
> expectations could lead to fragmentation across catalogs.
>
> I would also be cautious about layering in governance or policy related
> semantics too early, as that may further increase the risk of inconsistent
> interpretations.
>
> Yufei
>
>
> On Mon, May 4, 2026 at 3:55 PM EJ Wang <[email protected]>
> wrote:
>
>> Hi folks,
>>
>> I'm new to the Iceberg community, currently contributing to Polaris OSS
>> on the tagging design. Before going deeper into a design doc, I want to
>> surface the direction on this list and invite early input from people with
>> more context on how IRC-level concepts get shaped here.
>>
>> Polaris users are asking for a classification primitive that covers
>> compliance (PII, sensitivity, data domain), ownership and cost attribution,
>> and AI or semantic hints on columns. My read is that we will build this
>> regardless, but designing it inside Polaris alone reduces its value.
>> Governance tools would need per-catalog adapters. If the shape is
>> standardized at the IRC level, the ecosystem benefits far more broadly.
>>
>> Across catalogs and governance platforms, the tag concept has
>> independently converged on a similar shape: a first-class Tag entity with
>> identity (name + namespace), optional schema (allowed values,
>> inheritability), and attachments to objects carrying a value. Snowflake
>> tags, Unity Catalog governed tags, Google Cloud Dataplex tag templates,
>> Apache Atlas classifications, Apache Gravitino tags, and DataHub tags all
>> expose this pattern, across ownership, FinOps, AI reasoning, and governance
>> use cases. When independent products converge, my read is that the shape is
>> the natural decomposition rather than a vendor-specific artifact.
>>
>> Two adjacent efforts are already in flight. The read-restrictions
>> proposal (apache/iceberg#13879
>> <https://github.com/apache/iceberg/issues/13879>) delivers enforcement
>> to engines. A Tag proposal would complement it as the classification input
>> side, so catalogs can resolve tag-driven enforcement internally and deliver
>> the outcome via read-restrictions. The labels proposal (
>> apache/iceberg#15521 <https://github.com/apache/iceberg/issues/15521>)
>> serves
>> generic catalog-managed metadata. My read is that a first-class Tag with
>> identity and lifecycle is distinct from labels; they solve different
>> problems and can coexist.
>>
>> At a high level, I think the minimum valuable scope in the IRC spec is: a
>> Tag entity with CRUD at the namespace level, tag attachments with target
>> and value applied to tables, columns via field-id, views, and namespaces, a
>> reverse lookup endpoint for "find objects with tag X", tag attachment
>> retrieval via a dedicated endpoint, and a small set of normative clauses on
>> privilege enforcement, visibility filtering, and rename atomicity.
>> Resolved tags do not need to live in LoadTableResult.
>>
>> Things I'd like to keep out of the core spec as layered extensions, not
>> first pass: typed multi-field per-attachment values (Atlas, Dataplex;
>> addable non-breaking later), a Governed-vs-Standard type distinction (Unity
>> Catalog's pattern can be expressed through configuration), and
>> tag-to-policy binding (belongs in a separate Policy authoring phase).
>>
>> What I'm asking: early feedback on whether this direction fits the IRC
>> roadmap, pointers to prior discussions I may have missed, and interest in
>> co-championing from contributors outside Polaris. I'll follow up with a
>> full design doc in the coming week. An issue placeholder is at
>> apache/iceberg#16165 <https://github.com/apache/iceberg/issues/16165> for
>> tracking.
>>
>> -ej
>>
>

Reply via email to