I think it's probably a good idea to add more implementation-specific details to the spec, like the use of "comment" for table documentation. We recently added a section for this that is clear that these are not required but are important conventions.
I would not add "owner" to that section. Storing owner in table properties is not a good idea because it would either need to be controlled and overridden by catalogs or would be informational and untrustworthy. I think that owner is part of catalog metadata, not table metadata. On Thu, Aug 7, 2025 at 9:38 AM Guy Yasoor <guy.yas...@ryft.io.invalid> wrote: > Got it - I now understand better the meaning of "reserved table > properties", and I agree it shouldn't be touched or expanded. > > Going back to the original topic: > It appears that both `comment` and `owner` are important fields, which are > populated by some engines, and can prove useful for others, but aren't > standardized anywhere in the spec. > To improve engine alignment, I think they should be documented somewhere. > I'd suggest one of two approaches: > > 1. Either keeping them in the table properties map, and documenting it > in the Table Properties documentation > <https://iceberg.apache.org/docs/latest/configuration/#table-properties> > (but > not in the reserved section - perhaps it deserves its own section, "Table > context properties"?) > 2. Or adding them as optional top-level fields in the metadata.json > schema - this might be the "best practice" (especially if `owner` is > supposed to be controlled by the catalog). However, it will require > changing the current behavior of Spark, both for `owner` assignment, and > for `comment` assignment in "CREATE TABLE ... COMMENT 'table > documentation'". > > WDYT? > > > On Tue, Aug 5, 2025 at 8:08 PM Ryan Blue <rdb...@gmail.com> wrote: > >> The `format-version` table property is different because it is mapped to >> the format version that is not stored in table properties. It is reserved >> because implementations will override it and so it isn't a real table >> property. This is not a pattern that we want to expand because of the >> strange behavior. >> >> For cases like `comment`, these other properties are normal table >> properties that can be used like any other. If the schema had a doc string >> and that was used in place of `comment`, then I think it would be a >> reserved property. But there's no need for that because setting the >> property or using `COMMENT ON` would have the same behavior -- changing the >> property value. >> >> The `owner` property is a different case. Owner is something that should >> be restricted. A user should not be able to change it with just access to >> modify table metadata. Tracking a table's owner is the responsibility of >> the catalog and its access control scheme. Because of this, I don't think >> that we should standardize or encourage setting an `owner` table property. >> >> On Tue, Aug 5, 2025 at 4:21 AM Guy Yasoor <guy.yas...@ryft.io.invalid> >> wrote: >> >>> If using "comment" is the best practice, should we add this to the "reserved >>> table properties" docs >>> <https://iceberg.apache.org/docs/latest/configuration/#reserved-table-properties>, >>> to make sure it's aligned between different engines and implementations? >>> In the same opportunity, I would suggest adding "owner" as well, which >>> is automatically added by Spark. >>> >>> On Tue, Aug 5, 2025 at 2:16 AM Taeyun Kim <taeyun....@innowireless.com> >>> wrote: >>> >>>> Hi, >>>> >>>> I see, thank you for your response. >>>> >>>> Best regards, >>>> Taeyun >>>> >>>> -----Original Message----- >>>> From: "Ryan Blue" <rdb...@gmail.com> >>>> To: <dev@iceberg.apache.org>; >>>> Cc: >>>> Sent: 2025-08-05 (화) 07:45:43 (UTC+09:00) >>>> Subject: Re: Re: Thoughts on Adding a `doc` Property for Schema Objects >>>> >>>> >>>> If there isn't a significant difference between table-level >>>> description and schema-level description, then I think you should consider >>>> it standardized. You can store the table description in the "comment" table >>>> property. >>>> >>>> >>>> On Sun, Aug 3, 2025 at 5:28 PM Taeyun Kim <taeyun....@innowireless.com> >>>> wrote: >>>> Hi, >>>> >>>> I’ve already explained my reasoning in earlier messages, including the >>>> example about making table and column descriptions more accessible for >>>> LLM‑generated SQL. >>>> From my perspective, table‑level comments, like column‑level comments, >>>> should also be standardized. >>>> If standardized, it seems natural for them to be part of the schema >>>> definition, just like column‑level comments. >>>> This way, they stay consistent with the schema version and avoid >>>> drifting out of sync when the schema changes. >>>> >>>> Thanks, >>>> Taeyun >>>> >>>> >>>> -----Original Message----- >>>> From: "Ryan Blue" <rdb...@gmail.com> >>>> To: <dev@iceberg.apache.org>; >>>> Cc: >>>> Sent: 2025-07-26 (토) 08:05:55 (UTC+09:00) >>>> Subject: Re: Thoughts on Adding a `doc` Property for Schema Objects >>>> >>>> >>>> Why would you need to version table descriptions? Are there cases where >>>> they are changing rapidly and inaccurate due to schema changes? >>>> >>>> >>>> On Thu, Jul 24, 2025 at 7:48 PM Taeyun Kim <taeyun....@innowireless.com> >>>> wrote: >>>> >>>> Thank you for your reply. >>>> >>>> Column-level comments are already part of the schema definition. Would >>>> adding just one table-level comment really cause noticeable bloat? For >>>> example, if a table has 20 columns, adding one more comment would only >>>> increase the metadata size by about 1/20th. >>>> >>>> Also, using schema-id as part of the property key feels like a >>>> workaround rather than a proper solution. It is not part of the >>>> specification, so any tool or integration (including LLM-based ones) would >>>> need extra logic to interpret it. A standardized, schema-level field would >>>> avoid that complexity and make the metadata easier to consume consistently. >>>> >>>> If bloat is a real concern, perhaps column-level comments should also >>>> be moved out of the schema, with a proper mechanism to version and manage >>>> them separately. >>>> >>>> Thank you, >>>> Taeyun. >>>> >>>> -----Original Message----- >>>> From: "Gang Wu" <ust...@gmail.com> >>>> To: <dev@iceberg.apache.org>; >>>> Cc: >>>> Sent: 2025-07-25 (금) 11:20:08 (UTC+09:00) >>>> Subject: Re: Thoughts on Adding a `doc` Property for Schema Objects >>>> >>>> >>>> I'd rather not complicate the schema definitions in the table metadata. >>>> You may append `schema-id` to the key of table property to manage different >>>> schema versions. >>>> >>>> >>>> Storing verbose text to each field may bloat the metadata storage, >>>> especially when there are a lot of duplicate `doc`s if schema evolution >>>> happens a lot. >>>> >>>> >>>> Best, >>>> Gang >>>> >>>> >>>> On Fri, Jul 25, 2025 at 9:25 AM Taeyun Kim <taeyun....@innowireless.com> >>>> wrote: >>>> >>>> Thank you for your response. >>>> As I understand it, the table description is currently stored as a >>>> table property within the table metadata’s `properties` map. >>>> >>>> In my opinion, this approach has a few issues: >>>> >>>> - Table metadata `properties` are not versioned. As a result, when >>>> querying an older snapshot, the description may be inaccurate because the >>>> value reflects only the current state. >>>> - According to the specification, the purpose of table metadata >>>> properties is: “A string to string map of table properties. This is used to >>>> control settings that affect reading and writing and is not intended to be >>>> used for arbitrary metadata.” Based on this, a comment seems to fall under >>>> “arbitrary metadata,” and therefore may not be an appropriate use of >>>> properties. >>>> - Table comments seem to have become significant enough that relying on >>>> a convention alone may no longer be sufficient. It might be worth >>>> considering a standardized, schema-level field for them. >>>> >>>> Thank you. >>>> Taeyun >>>> >>>> -----Original Message----- >>>> From: "Ryan Blue" <rdb...@gmail.com> >>>> To: <dev@iceberg.apache.org>; >>>> Cc: >>>> Sent: 2025-07-25 (금) 08:48:48 (UTC+09:00) >>>> Subject: Re: Thoughts on Adding a `doc` Property for Schema Objects >>>> >>>> >>>> Iceberg does allow you to store table descriptions. The convention is >>>> to use a table property, "comment". While this isn't a schema-level >>>> doc/comment, I don't know of anything that makes a distinction between >>>> schema description and table description, so I think it should work for >>>> your use. >>>> >>>> >>>> >>>> On Tue, Jul 22, 2025 at 5:48 PM 김태연 (Taeyun Kim) < >>>> taeyun....@innowireless.com> wrote: >>>> >>>> Hi, >>>> >>>> With the growing trend of using LLMs to automatically generate SQL, it >>>> feels increasingly important to manage descriptions of database tables and >>>> columns in a way that these tools can easily access. >>>> >>>> In the Iceberg specification, comments for schema fields (i.e., >>>> columns) can be specified using the `doc` property within the `fields` >>>> array of a `struct` type. However, there doesn’t seem to be a way to >>>> specify a comment for the root struct type itself - that is, for the table >>>> as a whole. >>>> >>>> From what I can tell, OLAP DBMSs today may handle table-level comments >>>> by storing them in the `properties` map within the table metadata under >>>> various non-standard keys. But since a table comment conceptually belongs >>>> to the schema, and can vary by schema, it feels like the `properties` map >>>> within the table metadata might not be the best place for it. >>>> >>>> Would it make sense to allow a `doc` property on the `schema` object >>>> (the root struct type), alongside `schema-id` and `identifier-field-ids`, >>>> so that a description for the schema itself can be included? >>>> It seems like it would be helpful, especially for tooling and >>>> LLM-related use cases. >>>> >>>> Curious to hear your thoughts. >>>> Apologies if I’m overlooking something or if this has already been >>>> discussed. >>>> >>>> Thank you, >>>> Taeyun >>> >>>