[ https://issues.apache.org/jira/browse/SPARK-50541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Amanda Liu updated SPARK-50541: ------------------------------- Description: Support DESCRIBE TABLE ... [AS JSON] option to display table metadata in JSON format. *Context:* The Spark SQL command DESCRIBE TABLE displays table metadata in a DataFrame format geared toward human consumption. This format causes parsing challenges, e.g. if fields contain special characters or the format changes as new features are added. The new AS JSON option would return the table metadata as a JSON string that supports parsing via machine, while being extensible with a minimized risk of breaking changes. It is not meant to be human-readable. *SQL Ref Spec:* { DESC | DESCRIBE } [ TABLE ] [ EXTENDED | FORMATTED ] table_name \{ [ PARTITION clause ] | [ column_name ] } [ AS JSON ] *JSON Schema:* ``` { "table_name": "<table_name>", "catalog_name": [...], "database_name": [...], "qualified_name": "<qualified_name>" "type": "<table_type>", "provider": "<provider>", "columns": [ { "id": 1, "name": "<name>", "type": <type_json>, "comment": "<comment>", "default": "<default_val>" } ], "partition_values": { "<col_name>": "<val>" } , "location": "<path>", "view_definition": "<view_defn>", "owner": "<owner>", "comment": "<comment>", "table_properties": { "property1": "<property1>", "property2": "<property2>" } , "storage_properties": { "property1": "<property1>", "property2": "<property2>" } , "serde_library": "<serde_library>", "inputformat": "<input_format>", "outputformt": "<output_format>", "bucket_columns": [<col_name>], "sort_columns": [<col_name>], "created_time": "<timestamp>", "last_access": "<timestamp>", "partition_provider": "<partition_provider>" } ``` was: Support DESCRIBE TABLE ... [AS JSON] option to display table metadata in JSON format. *Context:* The Spark SQL command DESCRIBE TABLE displays table metadata in a DataFrame format geared toward human consumption. This format causes parsing challenges, e.g. if fields contain special characters or the format changes as new features are added. [DBT|https://www.getdbt.com/] is an example customer that motivates this proposal, as providing a structured JSON format can help prevent breakages in pipelines that depend on parsing table metadata. The new AS JSON option would return the table metadata as a JSON string that supports parsing via machine, while being extensible with a minimized risk of breaking changes. It is not meant to be human-readable. *SQL Ref Spec:* { DESC | DESCRIBE } [ TABLE ] [ EXTENDED | FORMATTED ] table_name \{ [ PARTITION clause ] | [ column_name ] } [ AS JSON ] *JSON Schema:* ``` { "table_name": "<table_name>", "catalog_name": [...], "database_name": [...], "qualified_name": "<qualified_name>" "type": "<table_type>", "provider": "<provider>", "columns": [ { "id": 1, "name": "<name>", "type": <type_json>, "comment": "<comment>", "default": "<default_val>" } ], "partition_values": { "<col_name>": "<val>" }, "location": "<path>", "view_definition": "<view_defn>", "owner": "<owner>", "comment": "<comment>", "table_properties": { "property1": "<property1>", "property2": "<property2>" }, "storage_properties": { "property1": "<property1>", "property2": "<property2>" }, "serde_library": "<serde_library>", "inputformat": "<input_format>", "outputformt": "<output_format>", "bucket_columns": [<col_name>], "sort_columns": [<col_name>], "created_time": "<timestamp>", "last_access": "<timestamp>", "partition_provider": "<partition_provider>" } ``` > Describe Table As JSON > ---------------------- > > Key: SPARK-50541 > URL: https://issues.apache.org/jira/browse/SPARK-50541 > Project: Spark > Issue Type: Task > Components: SQL > Affects Versions: 4.0.0 > Reporter: Amanda Liu > Priority: Major > > Support DESCRIBE TABLE ... [AS JSON] option to display table metadata in > JSON format. > > *Context:* > The Spark SQL command DESCRIBE TABLE displays table metadata in a DataFrame > format geared toward human consumption. This format causes parsing > challenges, e.g. if fields contain special characters or the format changes > as new features are added. > The new AS JSON option would return the table metadata as a JSON string that > supports parsing via machine, while being extensible with a minimized risk of > breaking changes. It is not meant to be human-readable. > > *SQL Ref Spec:* > { DESC | DESCRIBE } > [ TABLE ] [ EXTENDED | FORMATTED ] table_name \{ [ PARTITION clause ] | [ > column_name ] } [ AS JSON ] > > *JSON Schema:* > ``` > { > "table_name": "<table_name>", > "catalog_name": [...], > "database_name": [...], > "qualified_name": "<qualified_name>" > "type": "<table_type>", > "provider": "<provider>", > "columns": [ > { "id": 1, "name": "<name>", "type": <type_json>, "comment": "<comment>", > "default": "<default_val>" } > ], > "partition_values": > { "<col_name>": "<val>" } > , > "location": "<path>", > "view_definition": "<view_defn>", > "owner": "<owner>", > "comment": "<comment>", > "table_properties": > { "property1": "<property1>", "property2": "<property2>" } > , > "storage_properties": > { "property1": "<property1>", "property2": "<property2>" } > , > "serde_library": "<serde_library>", > "inputformat": "<input_format>", > "outputformt": "<output_format>", > "bucket_columns": [<col_name>], > "sort_columns": [<col_name>], > "created_time": "<timestamp>", > "last_access": "<timestamp>", > "partition_provider": "<partition_provider>" > } > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org