Qiheng He created HIVE-28316: -------------------------------- Summary: The documentation provides an ambiguous explanation regarding the mutually exclusive nature of `STORED BY` and `STORED AS` Key: HIVE-28316 URL: https://issues.apache.org/jira/browse/HIVE-28316 Project: Hive Issue Type: Bug Reporter: Qiheng He
- The documentation provides an ambiguous explanation regarding the mutually exclusive nature of {*}STORED BY{*} and {*}STORED AS{*}. - As mentioned on https://cwiki.apache.org/confluence/display/Hive/StorageHandlers , when the {*}CREATE TABLE{*} statement specifies {*}STORED BY{*}, it should not also specify {*}STORED AS{*}. The content in question is as follows. {code:bash} When STORED BY is specified, then row_format (DELIMITED or SERDE) and STORED AS cannot be specified. Optional SERDEPROPERTIES can be specified as part of the STORED BY clause and will be passed to the serde provided by the storage handler. See CREATE TABLE and Row Format, Storage Format, and SerDe for more information. Example: CREATE TABLE hbase_table_1(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( "hbase.columns.mapping" = "cf:string", "hbase.table.name" = "hbase_table_0" ); {code} - This is similarly reflected in the documentation at https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL , where {*}|{*} separates {*}STORED BY{*} from {*}STORED AS{*}, indicating their distinct usage and mutual exclusivity. {code:bash} [ [ROW FORMAT row_format] [STORED AS file_format] | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6.0 and later) ] {code} - However, this contradicts the information provided in the Hive-Iceberg Integration documentation at https://cwiki.apache.org/confluence/display/Hive/Hive-Iceberg+Integration , which explicitly gives examples demonstrating that {*}STORED BY{*} can coexist with {*}STORED AS{*}. This creates an ambiguous interpretation. {code:bash} The iceberg table currently supports three file formats: PARQUET, ORC & AVRO. The default file format is Parquet. The file format can be explicitily provided by using STORED AS <Format> while creating the table Example-1: CREATE TABLE ORC_TABLE (ID INT) STORED BY ICEBERG STORED AS ORC; {code} - Further early discussions on this topic can be found at https://github.com/apache/shardingsphere/pull/31526 . -- This message was sent by Atlassian Jira (v8.20.10#820010)