Thanks Andrei. Coordinating the two efforts before either hardens is the
right call, and I want to do that.

To your direct question: I'd want Tag to stay a separated first-class REST
concept (TagDefinition with namespace/name, allowed values, inheritability,
CRUD lifecycle), not built on the labels track as a spec-level dependency.
The point of standardizing Tag is shared classification semantics across
systems, and that needs more than allowed_values, the proposal will be
explicit about normative interpretation (visibility, atomicity, attachment
value type) so a tag's meaning doesn't drift across catalogs.

Whether a catalog persists or indexes tag assignments using the same
machinery as labels (reserved-namespace pattern, dedicated reverse-lookup
index, or something else) reads as a catalog implementation choice to me.
I'd rather the spec leave that open than commit one shape, and the REST
contract should expose tag assignments through tag-specific semantics, not
through reserved-namespace labels.

Governed/Standard maps to your catalog-managed vs client-writable axis.
I'll align terminology where the surfaces overlap.

One scope point I want to keep clean: this proposal is the classification
input side. Policy enforcement stays separate, on the read-restrictions
track (#13879).

A sync before I write the full design doc would help. Paired proposals with
a coordinated boundary is the shape I'd target. Slack works, or any morning
your timezone, send a few options.

-ej

On Tue, May 12, 2026 at 1:38 AM Andrei Tserakhau via dev <
[email protected]> wrote:

> Hi EJ,
>
> Thanks for starting this thread. I think there is overlap with my labels
> work in flight, and it would be good to converge the two efforts rather
> than end up with two parallel attachment mechanisms.
>
> Current labels work:
>
> - Proposal (#15521):
>
> https://docs.google.com/document/d/1aj-6JlfBiMYEEVtNuh5WLMOrRQiMCcyYUGbouPM4hXI/edit
> - OpenAPI PR (#15750): generic Labels primitive — flat k/v on tables,
> views,
>   namespaces, and columns by field-id.
> - CRUD follow-up:
>
> https://docs.google.com/document/d/1aj-6JlfBiMYEEVtNuh5WLMOrRQiMCcyYUGbouPM4hXI/edit?tab=t.emq0gkbmc7bx#heading=h.ijaa62gyvv30
> -- UpdateLabels, ETags, atomicity, catalog-managed vs client-writable keys
> and SQL DDL surface.
>
> My read is that labels cover most of the attachment side of your proposal:
> tables, columns, views, namespaces, and the use cases you listed. Where
> your proposal adds something new is the management model: first-class Tag
> definitions with namespace-scoped identity, allowed values, inheritance
> rules, and reverse lookup.
>
> That part is worth designing.
>
> The cleanest framing I see is:
>
> 1. Labels = attachment mechanism. Generic k/v on tables, columns, views,
> and namespaces.
> 2. Tags = governance model on top. Named Tag definitions with allowed
> values and management endpoints. One possible implementation is to store
> tag attachments as labels under a reserved namespace, with reverse lookup
> as a dedicated endpoint indexed over labels.
>
> Two things worth aligning early: your Governed/Standard split looks close
> to
> the same axis as catalog-managed vs client-writable in the CRUD follow-up,
> so we should probably reconcile the terminology. Allowed values on Tag
> definitions also seem like the structural answer to the interop concern
> Yufei raised on both threads.
>
> Would you be open to building the Tag entity work on top of the labels
> track?
> That could be a section in the existing docs, a paired proposal, or
> whatever
> shape works best.
>
> Happy to chat on Slack or set up a quick sync before you invest in the full
> design doc.
>
> Thanks,
> Andrei
>
> On Tue, May 12, 2026 at 1:27 AM Yufei Gu <[email protected]> wrote:
>
>> Hi EJ,
>>
>> Thanks for sharing this.
>>
>> Tagging is useful as a lightweight way to categorize objects. It can help
>> with common cases like basic classification, ownership or cost attribution,
>> and simple discovery or filtering. That said, even with this lightweight
>> framing, I’m still a bit concerned about how different catalog
>> implementations and engines will interpret tags and whether we can make
>> them truly interoperable. In practice, small differences in semantics or
>> expectations could lead to fragmentation across catalogs.
>>
>> I would also be cautious about layering in governance or policy related
>> semantics too early, as that may further increase the risk of inconsistent
>> interpretations.
>>
>> Yufei
>>
>>
>> On Mon, May 4, 2026 at 3:55 PM EJ Wang <[email protected]>
>> wrote:
>>
>>> Hi folks,
>>>
>>> I'm new to the Iceberg community, currently contributing to Polaris OSS
>>> on the tagging design. Before going deeper into a design doc, I want to
>>> surface the direction on this list and invite early input from people with
>>> more context on how IRC-level concepts get shaped here.
>>>
>>> Polaris users are asking for a classification primitive that covers
>>> compliance (PII, sensitivity, data domain), ownership and cost attribution,
>>> and AI or semantic hints on columns. My read is that we will build this
>>> regardless, but designing it inside Polaris alone reduces its value.
>>> Governance tools would need per-catalog adapters. If the shape is
>>> standardized at the IRC level, the ecosystem benefits far more broadly.
>>>
>>> Across catalogs and governance platforms, the tag concept has
>>> independently converged on a similar shape: a first-class Tag entity with
>>> identity (name + namespace), optional schema (allowed values,
>>> inheritability), and attachments to objects carrying a value. Snowflake
>>> tags, Unity Catalog governed tags, Google Cloud Dataplex tag templates,
>>> Apache Atlas classifications, Apache Gravitino tags, and DataHub tags all
>>> expose this pattern, across ownership, FinOps, AI reasoning, and governance
>>> use cases. When independent products converge, my read is that the shape is
>>> the natural decomposition rather than a vendor-specific artifact.
>>>
>>> Two adjacent efforts are already in flight. The read-restrictions
>>> proposal (apache/iceberg#13879
>>> <https://github.com/apache/iceberg/issues/13879>) delivers enforcement
>>> to engines. A Tag proposal would complement it as the classification input
>>> side, so catalogs can resolve tag-driven enforcement internally and deliver
>>> the outcome via read-restrictions. The labels proposal (
>>> apache/iceberg#15521 <https://github.com/apache/iceberg/issues/15521>)
>>> serves
>>> generic catalog-managed metadata. My read is that a first-class Tag with
>>> identity and lifecycle is distinct from labels; they solve different
>>> problems and can coexist.
>>>
>>> At a high level, I think the minimum valuable scope in the IRC spec is:
>>> a Tag entity with CRUD at the namespace level, tag attachments with target
>>> and value applied to tables, columns via field-id, views, and namespaces, a
>>> reverse lookup endpoint for "find objects with tag X", tag attachment
>>> retrieval via a dedicated endpoint, and a small set of normative clauses on
>>> privilege enforcement, visibility filtering, and rename atomicity.
>>> Resolved tags do not need to live in LoadTableResult.
>>>
>>> Things I'd like to keep out of the core spec as layered extensions, not
>>> first pass: typed multi-field per-attachment values (Atlas, Dataplex;
>>> addable non-breaking later), a Governed-vs-Standard type distinction (Unity
>>> Catalog's pattern can be expressed through configuration), and
>>> tag-to-policy binding (belongs in a separate Policy authoring phase).
>>>
>>> What I'm asking: early feedback on whether this direction fits the IRC
>>> roadmap, pointers to prior discussions I may have missed, and interest in
>>> co-championing from contributors outside Polaris. I'll follow up with a
>>> full design doc in the coming week. An issue placeholder is at
>>> apache/iceberg#16165 <https://github.com/apache/iceberg/issues/16165> for
>>> tracking.
>>>
>>> -ej
>>>
>>

Reply via email to