Qiheng He created HIVE-28316:
--------------------------------

             Summary: The documentation provides an ambiguous explanation 
regarding the mutually exclusive nature of `STORED BY` and `STORED AS`
                 Key: HIVE-28316
                 URL: https://issues.apache.org/jira/browse/HIVE-28316
             Project: Hive
          Issue Type: Bug
            Reporter: Qiheng He


- The documentation provides an ambiguous explanation regarding the mutually 
exclusive nature of {*}STORED BY{*} and {*}STORED AS{*}.
- As mentioned on 
https://cwiki.apache.org/confluence/display/Hive/StorageHandlers , when the 
{*}CREATE TABLE{*} statement specifies {*}STORED BY{*}, it should not also 
specify {*}STORED AS{*}. The content in question is as follows.
{code:bash}
When STORED BY is specified, then row_format (DELIMITED or SERDE) and STORED AS 
cannot be specified. Optional SERDEPROPERTIES can be specified as part of the 
STORED BY clause and will be passed to the serde provided by the storage 
handler.

See CREATE TABLE and Row Format, Storage Format, and SerDe for more information.

Example:

CREATE TABLE hbase_table_1(key int, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
"hbase.columns.mapping" = "cf:string",
"hbase.table.name" = "hbase_table_0"
);
{code}
- This is similarly reflected in the documentation at 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL , where 
{*}|{*} separates {*}STORED BY{*} from {*}STORED AS{*}, indicating their 
distinct usage and mutual exclusivity.
{code:bash}
[
   [ROW FORMAT row_format] 
   [STORED AS file_format]
     | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]  -- 
(Note: Available in Hive 0.6.0 and later)
]
{code}
- However, this contradicts the information provided in the Hive-Iceberg 
Integration documentation at 
https://cwiki.apache.org/confluence/display/Hive/Hive-Iceberg+Integration , 
which explicitly gives examples demonstrating that {*}STORED BY{*} can coexist 
with {*}STORED AS{*}. This creates an ambiguous interpretation. 
{code:bash}
The iceberg table currently supports three file formats: PARQUET, ORC & AVRO. 
The default file format is Parquet. The file format can be explicitily provided 
by using STORED AS <Format> while creating the table

Example-1:

CREATE TABLE ORC_TABLE (ID INT) STORED BY ICEBERG STORED AS ORC;
{code}
- Further early discussions on this topic can be found at 
https://github.com/apache/shardingsphere/pull/31526 .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to