AVRO for Iceberg

ASF GitHub Bot (Jira) Fri, 04 Jun 2021 01:35:00 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-25194?focusedWorklogId=606869&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-606869
 ]


ASF GitHub Bot logged work on HIVE-25194:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 04/Jun/21 08:33
            Start Date: 04/Jun/21 08:33
    Worklog Time Spent: 10m 
      Work Description: lcspinter commented on a change in pull request #2348:
URL: https://github.com/apache/hive/pull/2348#discussion_r645345788



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/StorageFormat.java
##########
@@ -81,11 +84,34 @@ public boolean fillStorageFormat(ASTNode child) throws 
SemanticException {
       }
       break;
     case HiveParser.TOK_STORAGEHANDLER:
-      storageHandler = processStorageHandler(child.getChild(0).getText());
-      if (child.getChildCount() == 2) {
-        BaseSemanticAnalyzer.readProps(
-          (ASTNode) (child.getChild(1).getChild(0)),
-          serdeProps);
+      for (int i = 0; i < child.getChildCount(); i++) {

Review comment:
       In this case, the `storagehandler.getFileFormatPropertyKey` would return 
`null` and the fileformat property would be disregarded. But I see your point 
here, maybe we should warn the end user about this.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/StorageFormat.java
##########
@@ -81,11 +84,34 @@ public boolean fillStorageFormat(ASTNode child) throws 
SemanticException {
       }
       break;
     case HiveParser.TOK_STORAGEHANDLER:
-      storageHandler = processStorageHandler(child.getChild(0).getText());
-      if (child.getChildCount() == 2) {
-        BaseSemanticAnalyzer.readProps(
-          (ASTNode) (child.getChild(1).getChild(0)),
-          serdeProps);
+      for (int i = 0; i < child.getChildCount(); i++) {
+        ASTNode grandChild = (ASTNode) child.getChild(i);
+        switch (grandChild.getToken().getType()) {
+          case HiveParser.TOK_FILEFORMAT_GENERIC:
+            String fileFormatPropertyKey = null;
+            try {
+              HiveStorageHandler handler = HiveUtils.getStorageHandler(conf, 
this.storageHandler);
+              fileFormatPropertyKey = handler.getFileFormatPropertyKey();
+            } catch (HiveException e) {
+              throw new SemanticException("Failed to load storage handler:  " 
+ e.getMessage());
+            }
+
+            if (fileFormatPropertyKey != null) {
+              String fileFormat = grandChild.getChild(0).getText();
+              if (serdeProps.containsKey(fileFormatPropertyKey)) {
+                throw new SemanticException("Provide only one of the 
following: STORED BY " + fileFormat +
+                    " or WITH SERDEPROPERTIES('" + fileFormatPropertyKey + 
"'='" + fileFormat + "')");
+              }
+
+              serdeProps.put(fileFormatPropertyKey, fileFormat);

Review comment:
       If the provided file format is not supported, iceberg defaults back to 
`avro`.  After the table creation, the metadata shows the not supported file 
format, but right after the first insert the property value is updated to 
`avro`. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 606869)
    Time Spent: 1h 10m  (was: 1h)

> Add support for STORED AS ORC/PARQUET/AVRO for Iceberg
> ------------------------------------------------------
>
>                 Key: HIVE-25194
>                 URL: https://issues.apache.org/jira/browse/HIVE-25194
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: László Pintér
>            Assignee: László Pintér
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently we have to specify the fileformat in TBLPROPERTIES during Iceberg 
> create table statements.
> The ideal syntax would be:
> CREATE TABLE tbl STORED BY ICEBERG STORED AS ORC ...
> One complication is that currently stored by and stored as are not permitted 
> within the same query, so that needs to be amended.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25194) Add support for STORED AS ORC/PARQUET/AVRO for Iceberg

Reply via email to