I'd rather not complicate the schema definitions in the table metadata. You
may append `schema-id` to the key of table property to manage different
schema versions.

Storing verbose text to each field may bloat the metadata storage,
especially when there are a lot of duplicate `doc`s if schema evolution
happens a lot.

Best,
Gang

On Fri, Jul 25, 2025 at 9:25 AM Taeyun Kim <taeyun....@innowireless.com>
wrote:

> Thank you for your response.
> As I understand it, the table description is currently stored as a table
> property within the table metadata’s `properties` map.
>
> In my opinion, this approach has a few issues:
>
> - Table metadata `properties` are not versioned. As a result, when
> querying an older snapshot, the description may be inaccurate because the
> value reflects only the current state.
> - According to the specification, the purpose of table metadata properties
> is: “A string to string map of table properties. This is used to control
> settings that affect reading and writing and is not intended to be used for
> arbitrary metadata.” Based on this, a comment seems to fall under
> “arbitrary metadata,” and therefore may not be an appropriate use of
> properties.
> - Table comments seem to have become significant enough that relying on a
> convention alone may no longer be sufficient. It might be worth considering
> a standardized, schema-level field for them.
>
> Thank you.
> Taeyun
>
> -----Original Message-----
> From: "Ryan Blue" <rdb...@gmail.com>
> To: <dev@iceberg.apache.org>;
> Cc:
> Sent: 2025-07-25 (금) 08:48:48 (UTC+09:00)
> Subject: Re: Thoughts on Adding a `doc` Property for Schema Objects
>
>
> Iceberg does allow you to store table descriptions. The convention is to
> use a table property, "comment". While this isn't a schema-level
> doc/comment, I don't know of anything that makes a distinction between
> schema description and table description, so I think it should work for
> your use.
>
>
>
> On Tue, Jul 22, 2025 at 5:48 PM 김태연 (Taeyun Kim) <
> taeyun....@innowireless.com> wrote:
>
> Hi,
>
> With the growing trend of using LLMs to automatically generate SQL, it
> feels increasingly important to manage descriptions of database tables and
> columns in a way that these tools can easily access.
>
> In the Iceberg specification, comments for schema fields (i.e., columns)
> can be specified using the `doc` property within the `fields` array of a
> `struct` type. However, there doesn’t seem to be a way to specify a comment
> for the root struct type itself - that is, for the table as a whole.
>
> From what I can tell, OLAP DBMSs today may handle table-level comments by
> storing them in the `properties` map within the table metadata under
> various non-standard keys. But since a table comment conceptually belongs
> to the schema, and can vary by schema, it feels like the `properties` map
> within the table metadata might not be the best place for it.
>
> Would it make sense to allow a `doc` property on the `schema` object (the
> root struct type), alongside `schema-id` and `identifier-field-ids`, so
> that a description for the schema itself can be included?
> It seems like it would be helpful, especially for tooling and LLM-related
> use cases.
>
> Curious to hear your thoughts.
> Apologies if I’m overlooking something or if this has already been
> discussed.
>
> Thank you,
> Taeyun
>

Reply via email to