Why would you need to version table descriptions? Are there cases where
they are changing rapidly and inaccurate due to schema changes?

On Thu, Jul 24, 2025 at 7:48 PM Taeyun Kim <taeyun....@innowireless.com>
wrote:

> Thank you for your reply.
>
> Column-level comments are already part of the schema definition. Would
> adding just one table-level comment really cause noticeable bloat? For
> example, if a table has 20 columns, adding one more comment would only
> increase the metadata size by about 1/20th.
>
> Also, using schema-id as part of the property key feels like a workaround
> rather than a proper solution. It is not part of the specification, so any
> tool or integration (including LLM-based ones) would need extra logic to
> interpret it. A standardized, schema-level field would avoid that
> complexity and make the metadata easier to consume consistently.
>
> If bloat is a real concern, perhaps column-level comments should also be
> moved out of the schema, with a proper mechanism to version and manage them
> separately.
>
> Thank you,
> Taeyun.
>
> -----Original Message-----
> From: "Gang Wu" <ust...@gmail.com>
> To: <dev@iceberg.apache.org>;
> Cc:
> Sent: 2025-07-25 (금) 11:20:08 (UTC+09:00)
> Subject: Re: Thoughts on Adding a `doc` Property for Schema Objects
>
>
> I'd rather not complicate the schema definitions in the table metadata.
> You may append `schema-id` to the key of table property to manage different
> schema versions.
>
>
> Storing verbose text to each field may bloat the metadata storage,
> especially when there are a lot of duplicate `doc`s if schema evolution
> happens a lot.
>
>
> Best,
> Gang
>
>
> On Fri, Jul 25, 2025 at 9:25 AM Taeyun Kim <taeyun....@innowireless.com>
> wrote:
>
> Thank you for your response.
> As I understand it, the table description is currently stored as a table
> property within the table metadata’s `properties` map.
>
> In my opinion, this approach has a few issues:
>
> - Table metadata `properties` are not versioned. As a result, when
> querying an older snapshot, the description may be inaccurate because the
> value reflects only the current state.
> - According to the specification, the purpose of table metadata properties
> is: “A string to string map of table properties. This is used to control
> settings that affect reading and writing and is not intended to be used for
> arbitrary metadata.” Based on this, a comment seems to fall under
> “arbitrary metadata,” and therefore may not be an appropriate use of
> properties.
> - Table comments seem to have become significant enough that relying on a
> convention alone may no longer be sufficient. It might be worth considering
> a standardized, schema-level field for them.
>
> Thank you.
> Taeyun
>
> -----Original Message-----
> From: "Ryan Blue" <rdb...@gmail.com>
> To: <dev@iceberg.apache.org>;
> Cc:
> Sent: 2025-07-25 (금) 08:48:48 (UTC+09:00)
> Subject: Re: Thoughts on Adding a `doc` Property for Schema Objects
>
>
> Iceberg does allow you to store table descriptions. The convention is to
> use a table property, "comment". While this isn't a schema-level
> doc/comment, I don't know of anything that makes a distinction between
> schema description and table description, so I think it should work for
> your use.
>
>
>
> On Tue, Jul 22, 2025 at 5:48 PM 김태연 (Taeyun Kim) <
> taeyun....@innowireless.com> wrote:
>
> Hi,
>
> With the growing trend of using LLMs to automatically generate SQL, it
> feels increasingly important to manage descriptions of database tables and
> columns in a way that these tools can easily access.
>
> In the Iceberg specification, comments for schema fields (i.e., columns)
> can be specified using the `doc` property within the `fields` array of a
> `struct` type. However, there doesn’t seem to be a way to specify a comment
> for the root struct type itself - that is, for the table as a whole.
>
> From what I can tell, OLAP DBMSs today may handle table-level comments by
> storing them in the `properties` map within the table metadata under
> various non-standard keys. But since a table comment conceptually belongs
> to the schema, and can vary by schema, it feels like the `properties` map
> within the table metadata might not be the best place for it.
>
> Would it make sense to allow a `doc` property on the `schema` object (the
> root struct type), alongside `schema-id` and `identifier-field-ids`, so
> that a description for the schema itself can be included?
> It seems like it would be helpful, especially for tooling and LLM-related
> use cases.
>
> Curious to hear your thoughts.
> Apologies if I’m overlooking something or if this has already been
> discussed.
>
> Thank you,
> Taeyun
>

Reply via email to