Why would you need to version table descriptions? Are there cases where they are changing rapidly and inaccurate due to schema changes?
On Thu, Jul 24, 2025 at 7:48 PM Taeyun Kim <taeyun....@innowireless.com> wrote: > Thank you for your reply. > > Column-level comments are already part of the schema definition. Would > adding just one table-level comment really cause noticeable bloat? For > example, if a table has 20 columns, adding one more comment would only > increase the metadata size by about 1/20th. > > Also, using schema-id as part of the property key feels like a workaround > rather than a proper solution. It is not part of the specification, so any > tool or integration (including LLM-based ones) would need extra logic to > interpret it. A standardized, schema-level field would avoid that > complexity and make the metadata easier to consume consistently. > > If bloat is a real concern, perhaps column-level comments should also be > moved out of the schema, with a proper mechanism to version and manage them > separately. > > Thank you, > Taeyun. > > -----Original Message----- > From: "Gang Wu" <ust...@gmail.com> > To: <dev@iceberg.apache.org>; > Cc: > Sent: 2025-07-25 (금) 11:20:08 (UTC+09:00) > Subject: Re: Thoughts on Adding a `doc` Property for Schema Objects > > > I'd rather not complicate the schema definitions in the table metadata. > You may append `schema-id` to the key of table property to manage different > schema versions. > > > Storing verbose text to each field may bloat the metadata storage, > especially when there are a lot of duplicate `doc`s if schema evolution > happens a lot. > > > Best, > Gang > > > On Fri, Jul 25, 2025 at 9:25 AM Taeyun Kim <taeyun....@innowireless.com> > wrote: > > Thank you for your response. > As I understand it, the table description is currently stored as a table > property within the table metadata’s `properties` map. > > In my opinion, this approach has a few issues: > > - Table metadata `properties` are not versioned. As a result, when > querying an older snapshot, the description may be inaccurate because the > value reflects only the current state. > - According to the specification, the purpose of table metadata properties > is: “A string to string map of table properties. This is used to control > settings that affect reading and writing and is not intended to be used for > arbitrary metadata.” Based on this, a comment seems to fall under > “arbitrary metadata,” and therefore may not be an appropriate use of > properties. > - Table comments seem to have become significant enough that relying on a > convention alone may no longer be sufficient. It might be worth considering > a standardized, schema-level field for them. > > Thank you. > Taeyun > > -----Original Message----- > From: "Ryan Blue" <rdb...@gmail.com> > To: <dev@iceberg.apache.org>; > Cc: > Sent: 2025-07-25 (금) 08:48:48 (UTC+09:00) > Subject: Re: Thoughts on Adding a `doc` Property for Schema Objects > > > Iceberg does allow you to store table descriptions. The convention is to > use a table property, "comment". While this isn't a schema-level > doc/comment, I don't know of anything that makes a distinction between > schema description and table description, so I think it should work for > your use. > > > > On Tue, Jul 22, 2025 at 5:48 PM 김태연 (Taeyun Kim) < > taeyun....@innowireless.com> wrote: > > Hi, > > With the growing trend of using LLMs to automatically generate SQL, it > feels increasingly important to manage descriptions of database tables and > columns in a way that these tools can easily access. > > In the Iceberg specification, comments for schema fields (i.e., columns) > can be specified using the `doc` property within the `fields` array of a > `struct` type. However, there doesn’t seem to be a way to specify a comment > for the root struct type itself - that is, for the table as a whole. > > From what I can tell, OLAP DBMSs today may handle table-level comments by > storing them in the `properties` map within the table metadata under > various non-standard keys. But since a table comment conceptually belongs > to the schema, and can vary by schema, it feels like the `properties` map > within the table metadata might not be the best place for it. > > Would it make sense to allow a `doc` property on the `schema` object (the > root struct type), alongside `schema-id` and `identifier-field-ids`, so > that a description for the schema itself can be included? > It seems like it would be helpful, especially for tooling and LLM-related > use cases. > > Curious to hear your thoughts. > Apologies if I’m overlooking something or if this has already been > discussed. > > Thank you, > Taeyun >