I'd rather not complicate the schema definitions in the table metadata. You may append `schema-id` to the key of table property to manage different schema versions.
Storing verbose text to each field may bloat the metadata storage, especially when there are a lot of duplicate `doc`s if schema evolution happens a lot. Best, Gang On Fri, Jul 25, 2025 at 9:25 AM Taeyun Kim <taeyun....@innowireless.com> wrote: > Thank you for your response. > As I understand it, the table description is currently stored as a table > property within the table metadata’s `properties` map. > > In my opinion, this approach has a few issues: > > - Table metadata `properties` are not versioned. As a result, when > querying an older snapshot, the description may be inaccurate because the > value reflects only the current state. > - According to the specification, the purpose of table metadata properties > is: “A string to string map of table properties. This is used to control > settings that affect reading and writing and is not intended to be used for > arbitrary metadata.” Based on this, a comment seems to fall under > “arbitrary metadata,” and therefore may not be an appropriate use of > properties. > - Table comments seem to have become significant enough that relying on a > convention alone may no longer be sufficient. It might be worth considering > a standardized, schema-level field for them. > > Thank you. > Taeyun > > -----Original Message----- > From: "Ryan Blue" <rdb...@gmail.com> > To: <dev@iceberg.apache.org>; > Cc: > Sent: 2025-07-25 (금) 08:48:48 (UTC+09:00) > Subject: Re: Thoughts on Adding a `doc` Property for Schema Objects > > > Iceberg does allow you to store table descriptions. The convention is to > use a table property, "comment". While this isn't a schema-level > doc/comment, I don't know of anything that makes a distinction between > schema description and table description, so I think it should work for > your use. > > > > On Tue, Jul 22, 2025 at 5:48 PM 김태연 (Taeyun Kim) < > taeyun....@innowireless.com> wrote: > > Hi, > > With the growing trend of using LLMs to automatically generate SQL, it > feels increasingly important to manage descriptions of database tables and > columns in a way that these tools can easily access. > > In the Iceberg specification, comments for schema fields (i.e., columns) > can be specified using the `doc` property within the `fields` array of a > `struct` type. However, there doesn’t seem to be a way to specify a comment > for the root struct type itself - that is, for the table as a whole. > > From what I can tell, OLAP DBMSs today may handle table-level comments by > storing them in the `properties` map within the table metadata under > various non-standard keys. But since a table comment conceptually belongs > to the schema, and can vary by schema, it feels like the `properties` map > within the table metadata might not be the best place for it. > > Would it make sense to allow a `doc` property on the `schema` object (the > root struct type), alongside `schema-id` and `identifier-field-ids`, so > that a description for the schema itself can be included? > It seems like it would be helpful, especially for tooling and LLM-related > use cases. > > Curious to hear your thoughts. > Apologies if I’m overlooking something or if this has already been > discussed. > > Thank you, > Taeyun >