Hi Iceberg Developers,
I would like to start a discussion on a potential enhancement to Iceberg around the implementation of key-value style properties (tags) for individual columns or fields. I believe this feature could have significant applications, especially in the domain of data governance. Here are some examples of how this feature can be potentially used: * PII Classification: Indicating whether a field contains Personally Identifiable Information (e.g., PII -> {true, false}). * Ontology Mapping: Associating fields with specific ontology terms (e.g., Type -> {USER_ID, USER_NAME, LOCATION}). * Sensitivity Level Setting: Defining the sensitivity level of a field (e.g., Sensitive -> {High, Medium, Low}). While current workarounds like table-level properties or column-level comments/docs exist, they lack the structured approach needed for these use cases. Table-level properties often require constant schema validation and can be error-prone, especially when not in sync with the table schema. Additionally, column-level comments, while useful, do not enforce a standardized format. I am also interested in hearing thoughts or experiences around whether this problem is addressed at the catalog level in any of the implementations (e.g., AWS Glue). My impression is that even with catalog-level implementations, there's still a need for continual validation against the table schema. Further, catalog-specific implementations will lack a standardized specification. A spec could be beneficial for areas requiring consistent and structured metadata management. I realize that introducing this feature may necessitate the development of APIs in various engines to set these properties or tags, such as extensions in Spark or Trino SQL. However, I believe it’s a worthwhile discussion to have, separate from whether Iceberg should include these features in its APIs. For the sake of this thread we can focus on the Iceberg APIs aspect. Here are some references to similar concepts in other systems: * Avro attributes: *Avro 1.10.2 Specification - Schemas* <https://avro.apache.org/docs/1.10.2/spec.html#schemas> (see "Attributes not defined in this document are permitted as metadata"). * BigQuery policy tags: *BigQuery Column-level Security* <https://cloud.google.com/bigquery/docs/column-level-security#set_policy>. * Snowflake object tagging: *Snowflake Object Tagging Documentation* <https://docs.snowflake.com/en/user-guide/object-tagging#create-and-assign-tags> (see references to "MODIFY COLUMN"). Looking forward to your insights on whether addressing this issue at the Iceberg specification and API level is a reasonable direction. Thanks, Walaa.