Thank you for your reply. Column-level comments are already part of the schema definition. Would adding just one table-level comment really cause noticeable bloat? For example, if a table has 20 columns, adding one more comment would only increase the metadata size by about 1/20th.
Also, using schema-id as part of the property key feels like a workaround rather than a proper solution. It is not part of the specification, so any tool or integration (including LLM-based ones) would need extra logic to interpret it. A standardized, schema-level field would avoid that complexity and make the metadata easier to consume consistently. If bloat is a real concern, perhaps column-level comments should also be moved out of the schema, with a proper mechanism to version and manage them separately. Thank you, Taeyun. -----Original Message----- From: "Gang Wu" <ust...@gmail.com> To: <dev@iceberg.apache.org>; Cc: Sent: 2025-07-25 (금) 11:20:08 (UTC+09:00) Subject: Re: Thoughts on Adding a `doc` Property for Schema Objects I'd rather not complicate the schema definitions in the table metadata. You may append `schema-id` to the key of table property to manage different schema versions. Storing verbose text to each field may bloat the metadata storage, especially when there are a lot of duplicate `doc`s if schema evolution happens a lot. Best, Gang On Fri, Jul 25, 2025 at 9:25 AM Taeyun Kim <taeyun....@innowireless.com> wrote: Thank you for your response. As I understand it, the table description is currently stored as a table property within the table metadata’s `properties` map. In my opinion, this approach has a few issues: - Table metadata `properties` are not versioned. As a result, when querying an older snapshot, the description may be inaccurate because the value reflects only the current state. - According to the specification, the purpose of table metadata properties is: “A string to string map of table properties. This is used to control settings that affect reading and writing and is not intended to be used for arbitrary metadata.” Based on this, a comment seems to fall under “arbitrary metadata,” and therefore may not be an appropriate use of properties. - Table comments seem to have become significant enough that relying on a convention alone may no longer be sufficient. It might be worth considering a standardized, schema-level field for them. Thank you. Taeyun -----Original Message----- From: "Ryan Blue" <rdb...@gmail.com> To: <dev@iceberg.apache.org>; Cc: Sent: 2025-07-25 (금) 08:48:48 (UTC+09:00) Subject: Re: Thoughts on Adding a `doc` Property for Schema Objects Iceberg does allow you to store table descriptions. The convention is to use a table property, "comment". While this isn't a schema-level doc/comment, I don't know of anything that makes a distinction between schema description and table description, so I think it should work for your use. On Tue, Jul 22, 2025 at 5:48 PM 김태연 (Taeyun Kim) <taeyun....@innowireless.com> wrote: Hi, With the growing trend of using LLMs to automatically generate SQL, it feels increasingly important to manage descriptions of database tables and columns in a way that these tools can easily access. In the Iceberg specification, comments for schema fields (i.e., columns) can be specified using the `doc` property within the `fields` array of a `struct` type. However, there doesn’t seem to be a way to specify a comment for the root struct type itself - that is, for the table as a whole. From what I can tell, OLAP DBMSs today may handle table-level comments by storing them in the `properties` map within the table metadata under various non-standard keys. But since a table comment conceptually belongs to the schema, and can vary by schema, it feels like the `properties` map within the table metadata might not be the best place for it. Would it make sense to allow a `doc` property on the `schema` object (the root struct type), alongside `schema-id` and `identifier-field-ids`, so that a description for the schema itself can be included? It seems like it would be helpful, especially for tooling and LLM-related use cases. Curious to hear your thoughts. Apologies if I’m overlooking something or if this has already been discussed. Thank you, Taeyun