[ 
https://issues.apache.org/jira/browse/HIVE-22371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17059292#comment-17059292
 ] 

Sungwoo commented on HIVE-22371:
--------------------------------

A workaround is to explicitly specify table properties, e.g.,
 
{code:sql}
create table call_center
stored as ${FILE}
TBLPROPERTIES('transactional'='true', 'transactional_properties'='default')
as select * from ${SOURCE}.call_center;
{code}

> CTAS not working with non-ACID managed tables
> ---------------------------------------------
>
>                 Key: HIVE-22371
>                 URL: https://issues.apache.org/jira/browse/HIVE-22371
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Planning
>    Affects Versions: 4.0.0
>            Reporter: Jaechang Kim
>            Priority: Major
>
> I used Hive commit HIVE-21344 (f16509a5c9187f592c48c253ee001fc3a5e0d508) in 
> the master branch, which was committed on 12 Oct.
> When I submit a query below, the query was finished without any errors.
> {code:sql}
> create table call_center
> stored as orc 
>  as select * from tpcds_text_2.call_center;
> {code}
> However, "select count( * ) from call_center" returned 0, and data in HDFS 
> looks strange.
>  * Two tables were created, one in the warehouse directory and another in the 
> external warehouse directory.
>  * Table `call_center` in the external warehouse is empty.
> {code:java}
>  > hdfs dfs -du -h $WAREHOUSE_PATH
>  5.0 K 14.9 K $WAREHOUSE_PATH/call_center
>  0 0 $WAREHOUSE_PATH/tpcds_text_2.db
> > hdfs dfs -du -h $EXTERNAL_WAREHOUSE_PATH
>  2.1 G 2.1 G $EXTERNAL_WAREHOUSE_PATH/2
>  0 0 $EXTERNAL_WAREHOUSE_PATH/call_center
> {code}
> After a few hours of digging, I guess this bug was introduced in HIVE-22158, 
> which creates every non-ACID managed table in the external warehouse 
> directory by default. In the example above, call_center is intended as a 
> managed table, but not explicitly specified as ACID. Hence, it should created 
> in the external warehouse directory.
> However, the table call_center created in the external warehouse directory is 
> empty, while another non-empty table of the same name is created in the 
> warehouse directory. This is because in the current implementation, the 
> (buggy) compiled query plan proceeds as follows:
> 1. Write data to a temporary directory
>  2. Move the data to the warehouse directory ($WAREHOUSE_PATH/call_center)
>  3. Create a table using data in the warehouse directory
> Without the bug, step 2 would move the data to the external warehouse 
> directory, and step 3 would create a table using the data in the external 
> warehouse directory. The crux of the problem is that the query compiler 
> checks only whether the query does not include the keyword "external" or not. 
> In other words, the query compiler should also be aware of the changes made 
> in HIVE-22158 and updated accordingly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to