[ 
https://issues.apache.org/jira/browse/SPARK-50541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amanda Liu updated SPARK-50541:
-------------------------------
    Description: 
Support DESCRIBE TABLE ...  [AS JSON] option to display table metadata in JSON 
format. 

 

*Context:*

The Spark SQL command DESCRIBE TABLE  displays table metadata in a DataFrame 
format geared toward human consumption. This format causes parsing challenges, 
e.g. if fields contain special characters or the format changes as new features 
are added.

The new AS JSON  option would return the table metadata as a JSON string that 
supports parsing via machine, while being extensible with a minimized risk of 
breaking changes. It is not meant to be human-readable.

 

*SQL Ref Spec:*

{ DESC | DESCRIBE }

[ TABLE ] [ EXTENDED | FORMATTED ] table_name \{ [ PARTITION clause ] | [ 
column_name ] } [ AS JSON ] 

 

*JSON Schema:*

```
{
"table_name": "<table_name>",
"catalog_name": [...],
"database_name": [...],
"qualified_name": "<qualified_name>"
"type": "<table_type>",
"provider": "<provider>",
"columns": [

{ "id": 1, "name": "<name>", "type": <type_json>, "comment": "<comment>", 
"default": "<default_val>" }

],
"partition_values":

{ "<col_name>": "<val>" }

,
"location": "<path>",
"view_definition": "<view_defn>",
"owner": "<owner>",
"comment": "<comment>",
"table_properties":

{ "property1": "<property1>", "property2": "<property2>" }

,
"storage_properties":

{ "property1": "<property1>", "property2": "<property2>" }

,
"serde_library": "<serde_library>",
"inputformat": "<input_format>",
"outputformt": "<output_format>",
"bucket_columns": [<col_name>],
"sort_columns": [<col_name>],
"created_time": "<timestamp>",
"last_access": "<timestamp>",
"partition_provider": "<partition_provider>"
}
```

  was:
Support DESCRIBE TABLE ...  [AS JSON] option to display table metadata in JSON 
format. 

 

*Context:*

The Spark SQL command DESCRIBE TABLE  displays table metadata in a DataFrame 
format geared toward human consumption. This format causes parsing challenges, 
e.g. if fields contain special characters or the format changes as new features 
are added. [DBT|https://www.getdbt.com/] is an example customer that motivates 
this proposal, as providing a structured JSON format can help prevent breakages 
in pipelines that depend on parsing table metadata.

 

The new AS JSON  option would return the table metadata as a JSON string that 
supports parsing via machine, while being extensible with a minimized risk of 
breaking changes. It is not meant to be human-readable.

 

*SQL Ref Spec:*

{ DESC | DESCRIBE } [ TABLE ] [ EXTENDED | FORMATTED ] table_name \{ [ 
PARTITION clause ] | [ column_name ] } [ AS JSON ] 

 

*JSON Schema:*

```
{
"table_name": "<table_name>",
"catalog_name": [...],
"database_name": [...],
"qualified_name": "<qualified_name>"
"type": "<table_type>",
"provider": "<provider>",
"columns": [
{
"id": 1,
"name": "<name>",
"type": <type_json>,
"comment": "<comment>",
"default": "<default_val>"
}
],
"partition_values": {
"<col_name>": "<val>"
},
"location": "<path>",
"view_definition": "<view_defn>",
"owner": "<owner>",
"comment": "<comment>",
"table_properties": {
"property1": "<property1>",
"property2": "<property2>"
},
"storage_properties": {
"property1": "<property1>",
"property2": "<property2>"
},
"serde_library": "<serde_library>",
"inputformat": "<input_format>",
"outputformt": "<output_format>",
"bucket_columns": [<col_name>],
"sort_columns": [<col_name>],
"created_time": "<timestamp>",
"last_access": "<timestamp>",
"partition_provider": "<partition_provider>"
}
```


> Describe Table As JSON
> ----------------------
>
>                 Key: SPARK-50541
>                 URL: https://issues.apache.org/jira/browse/SPARK-50541
>             Project: Spark
>          Issue Type: Task
>          Components: SQL
>    Affects Versions: 4.0.0
>            Reporter: Amanda Liu
>            Priority: Major
>
> Support DESCRIBE TABLE ...  [AS JSON] option to display table metadata in 
> JSON format. 
>  
> *Context:*
> The Spark SQL command DESCRIBE TABLE  displays table metadata in a DataFrame 
> format geared toward human consumption. This format causes parsing 
> challenges, e.g. if fields contain special characters or the format changes 
> as new features are added.
> The new AS JSON  option would return the table metadata as a JSON string that 
> supports parsing via machine, while being extensible with a minimized risk of 
> breaking changes. It is not meant to be human-readable.
>  
> *SQL Ref Spec:*
> { DESC | DESCRIBE }
> [ TABLE ] [ EXTENDED | FORMATTED ] table_name \{ [ PARTITION clause ] | [ 
> column_name ] } [ AS JSON ] 
>  
> *JSON Schema:*
> ```
> {
> "table_name": "<table_name>",
> "catalog_name": [...],
> "database_name": [...],
> "qualified_name": "<qualified_name>"
> "type": "<table_type>",
> "provider": "<provider>",
> "columns": [
> { "id": 1, "name": "<name>", "type": <type_json>, "comment": "<comment>", 
> "default": "<default_val>" }
> ],
> "partition_values":
> { "<col_name>": "<val>" }
> ,
> "location": "<path>",
> "view_definition": "<view_defn>",
> "owner": "<owner>",
> "comment": "<comment>",
> "table_properties":
> { "property1": "<property1>", "property2": "<property2>" }
> ,
> "storage_properties":
> { "property1": "<property1>", "property2": "<property2>" }
> ,
> "serde_library": "<serde_library>",
> "inputformat": "<input_format>",
> "outputformt": "<output_format>",
> "bucket_columns": [<col_name>],
> "sort_columns": [<col_name>],
> "created_time": "<timestamp>",
> "last_access": "<timestamp>",
> "partition_provider": "<partition_provider>"
> }
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to