I think it's probably a good idea to add more implementation-specific
details to the spec, like the use of "comment" for table documentation. We
recently added a section for this that is clear that these are not required
but are important conventions.

I would not add "owner" to that section. Storing owner in table properties
is not a good idea because it would either need to be controlled and
overridden by catalogs or would be informational and untrustworthy. I think
that owner is part of catalog metadata, not table metadata.

On Thu, Aug 7, 2025 at 9:38 AM Guy Yasoor <guy.yas...@ryft.io.invalid>
wrote:

> Got it - I now understand better the meaning of "reserved table
> properties", and I agree it shouldn't be touched or expanded.
>
> Going back to the original topic:
> It appears that both `comment` and `owner` are important fields, which are
> populated by some engines, and can prove useful for others, but aren't
> standardized anywhere in the spec.
> To improve engine alignment, I think they should be documented somewhere.
> I'd suggest one of two approaches:
>
>    1. Either keeping them in the table properties map, and documenting it
>    in the Table Properties documentation
>    <https://iceberg.apache.org/docs/latest/configuration/#table-properties> 
> (but
>    not in the reserved section - perhaps it deserves its own section, "Table
>    context properties"?)
>    2. Or adding them as optional top-level fields in the metadata.json
>    schema - this might be the "best practice" (especially if `owner` is
>    supposed to be controlled by the catalog). However, it will require
>    changing the current behavior of Spark, both for `owner` assignment, and
>    for `comment` assignment in "CREATE TABLE ... COMMENT 'table
>    documentation'".
>
> WDYT?
>
>
> On Tue, Aug 5, 2025 at 8:08 PM Ryan Blue <rdb...@gmail.com> wrote:
>
>> The `format-version` table property is different because it is mapped to
>> the format version that is not stored in table properties. It is reserved
>> because implementations will override it and so it isn't a real table
>> property. This is not a pattern that we want to expand because of the
>> strange behavior.
>>
>> For cases like `comment`, these other properties are normal table
>> properties that can be used like any other. If the schema had a doc string
>> and that was used in place of `comment`, then I think it would be a
>> reserved property. But there's no need for that because setting the
>> property or using `COMMENT ON` would have the same behavior -- changing the
>> property value.
>>
>> The `owner` property is a different case. Owner is something that should
>> be restricted. A user should not be able to change it with just access to
>> modify table metadata. Tracking a table's owner is the responsibility of
>> the catalog and its access control scheme. Because of this, I don't think
>> that we should standardize or encourage setting an `owner` table property.
>>
>> On Tue, Aug 5, 2025 at 4:21 AM Guy Yasoor <guy.yas...@ryft.io.invalid>
>> wrote:
>>
>>> If using "comment" is the best practice, should we add this to the "reserved
>>> table properties" docs
>>> <https://iceberg.apache.org/docs/latest/configuration/#reserved-table-properties>,
>>> to make sure it's aligned between different engines and implementations?
>>> In the same opportunity, I would suggest adding "owner" as well, which
>>> is automatically added by Spark.
>>>
>>> On Tue, Aug 5, 2025 at 2:16 AM Taeyun Kim <taeyun....@innowireless.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I see, thank you for your response.
>>>>
>>>> Best regards,
>>>> Taeyun
>>>>
>>>> -----Original Message-----
>>>> From: "Ryan Blue" <rdb...@gmail.com>
>>>> To: <dev@iceberg.apache.org>;
>>>> Cc:
>>>> Sent: 2025-08-05 (화) 07:45:43 (UTC+09:00)
>>>> Subject: Re: Re: Thoughts on Adding a `doc` Property for Schema Objects
>>>>
>>>>
>>>> If there isn't a significant difference between table-level
>>>> description and schema-level description, then I think you should consider
>>>> it standardized. You can store the table description in the "comment" table
>>>> property.
>>>>
>>>>
>>>> On Sun, Aug 3, 2025 at 5:28 PM Taeyun Kim <taeyun....@innowireless.com>
>>>> wrote:
>>>> Hi,
>>>>
>>>> I’ve already explained my reasoning in earlier messages, including the
>>>> example about making table and column descriptions more accessible for
>>>> LLM‑generated SQL.
>>>> From my perspective, table‑level comments, like column‑level comments,
>>>> should also be standardized.
>>>> If standardized, it seems natural for them to be part of the schema
>>>> definition, just like column‑level comments.
>>>> This way, they stay consistent with the schema version and avoid
>>>> drifting out of sync when the schema changes.
>>>>
>>>> Thanks,
>>>> Taeyun
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: "Ryan Blue" <rdb...@gmail.com>
>>>> To: <dev@iceberg.apache.org>;
>>>> Cc:
>>>> Sent: 2025-07-26 (토) 08:05:55 (UTC+09:00)
>>>> Subject: Re: Thoughts on Adding a `doc` Property for Schema Objects
>>>>
>>>>
>>>> Why would you need to version table descriptions? Are there cases where
>>>> they are changing rapidly and inaccurate due to schema changes?
>>>>
>>>>
>>>> On Thu, Jul 24, 2025 at 7:48 PM Taeyun Kim <taeyun....@innowireless.com>
>>>> wrote:
>>>>
>>>> Thank you for your reply.
>>>>
>>>> Column-level comments are already part of the schema definition. Would
>>>> adding just one table-level comment really cause noticeable bloat? For
>>>> example, if a table has 20 columns, adding one more comment would only
>>>> increase the metadata size by about 1/20th.
>>>>
>>>> Also, using schema-id as part of the property key feels like a
>>>> workaround rather than a proper solution. It is not part of the
>>>> specification, so any tool or integration (including LLM-based ones) would
>>>> need extra logic to interpret it. A standardized, schema-level field would
>>>> avoid that complexity and make the metadata easier to consume consistently.
>>>>
>>>> If bloat is a real concern, perhaps column-level comments should also
>>>> be moved out of the schema, with a proper mechanism to version and manage
>>>> them separately.
>>>>
>>>> Thank you,
>>>> Taeyun.
>>>>
>>>> -----Original Message-----
>>>> From: "Gang Wu" <ust...@gmail.com>
>>>> To: <dev@iceberg.apache.org>;
>>>> Cc:
>>>> Sent: 2025-07-25 (금) 11:20:08 (UTC+09:00)
>>>> Subject: Re: Thoughts on Adding a `doc` Property for Schema Objects
>>>>
>>>>
>>>> I'd rather not complicate the schema definitions in the table metadata.
>>>> You may append `schema-id` to the key of table property to manage different
>>>> schema versions.
>>>>
>>>>
>>>> Storing verbose text to each field may bloat the metadata storage,
>>>> especially when there are a lot of duplicate `doc`s if schema evolution
>>>> happens a lot.
>>>>
>>>>
>>>> Best,
>>>> Gang
>>>>
>>>>
>>>> On Fri, Jul 25, 2025 at 9:25 AM Taeyun Kim <taeyun....@innowireless.com>
>>>> wrote:
>>>>
>>>> Thank you for your response.
>>>> As I understand it, the table description is currently stored as a
>>>> table property within the table metadata’s `properties` map.
>>>>
>>>> In my opinion, this approach has a few issues:
>>>>
>>>> - Table metadata `properties` are not versioned. As a result, when
>>>> querying an older snapshot, the description may be inaccurate because the
>>>> value reflects only the current state.
>>>> - According to the specification, the purpose of table metadata
>>>> properties is: “A string to string map of table properties. This is used to
>>>> control settings that affect reading and writing and is not intended to be
>>>> used for arbitrary metadata.” Based on this, a comment seems to fall under
>>>> “arbitrary metadata,” and therefore may not be an appropriate use of
>>>> properties.
>>>> - Table comments seem to have become significant enough that relying on
>>>> a convention alone may no longer be sufficient. It might be worth
>>>> considering a standardized, schema-level field for them.
>>>>
>>>> Thank you.
>>>> Taeyun
>>>>
>>>> -----Original Message-----
>>>> From: "Ryan Blue" <rdb...@gmail.com>
>>>> To: <dev@iceberg.apache.org>;
>>>> Cc:
>>>> Sent: 2025-07-25 (금) 08:48:48 (UTC+09:00)
>>>> Subject: Re: Thoughts on Adding a `doc` Property for Schema Objects
>>>>
>>>>
>>>> Iceberg does allow you to store table descriptions. The convention is
>>>> to use a table property, "comment". While this isn't a schema-level
>>>> doc/comment, I don't know of anything that makes a distinction between
>>>> schema description and table description, so I think it should work for
>>>> your use.
>>>>
>>>>
>>>>
>>>> On Tue, Jul 22, 2025 at 5:48 PM 김태연 (Taeyun Kim) <
>>>> taeyun....@innowireless.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> With the growing trend of using LLMs to automatically generate SQL, it
>>>> feels increasingly important to manage descriptions of database tables and
>>>> columns in a way that these tools can easily access.
>>>>
>>>> In the Iceberg specification, comments for schema fields (i.e.,
>>>> columns) can be specified using the `doc` property within the `fields`
>>>> array of a `struct` type. However, there doesn’t seem to be a way to
>>>> specify a comment for the root struct type itself - that is, for the table
>>>> as a whole.
>>>>
>>>> From what I can tell, OLAP DBMSs today may handle table-level comments
>>>> by storing them in the `properties` map within the table metadata under
>>>> various non-standard keys. But since a table comment conceptually belongs
>>>> to the schema, and can vary by schema, it feels like the `properties` map
>>>> within the table metadata might not be the best place for it.
>>>>
>>>> Would it make sense to allow a `doc` property on the `schema` object
>>>> (the root struct type), alongside `schema-id` and `identifier-field-ids`,
>>>> so that a description for the schema itself can be included?
>>>> It seems like it would be helpful, especially for tooling and
>>>> LLM-related use cases.
>>>>
>>>> Curious to hear your thoughts.
>>>> Apologies if I’m overlooking something or if this has already been
>>>> discussed.
>>>>
>>>> Thank you,
>>>> Taeyun
>>>
>>>

Reply via email to