[ https://issues.apache.org/jira/browse/HIVE-22371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17059292#comment-17059292 ]
Sungwoo commented on HIVE-22371: -------------------------------- A workaround is to explicitly specify table properties, e.g., {code:sql} create table call_center stored as ${FILE} TBLPROPERTIES('transactional'='true', 'transactional_properties'='default') as select * from ${SOURCE}.call_center; {code} > CTAS not working with non-ACID managed tables > --------------------------------------------- > > Key: HIVE-22371 > URL: https://issues.apache.org/jira/browse/HIVE-22371 > Project: Hive > Issue Type: Bug > Components: Query Planning > Affects Versions: 4.0.0 > Reporter: Jaechang Kim > Priority: Major > > I used Hive commit HIVE-21344 (f16509a5c9187f592c48c253ee001fc3a5e0d508) in > the master branch, which was committed on 12 Oct. > When I submit a query below, the query was finished without any errors. > {code:sql} > create table call_center > stored as orc > as select * from tpcds_text_2.call_center; > {code} > However, "select count( * ) from call_center" returned 0, and data in HDFS > looks strange. > * Two tables were created, one in the warehouse directory and another in the > external warehouse directory. > * Table `call_center` in the external warehouse is empty. > {code:java} > > hdfs dfs -du -h $WAREHOUSE_PATH > 5.0 K 14.9 K $WAREHOUSE_PATH/call_center > 0 0 $WAREHOUSE_PATH/tpcds_text_2.db > > hdfs dfs -du -h $EXTERNAL_WAREHOUSE_PATH > 2.1 G 2.1 G $EXTERNAL_WAREHOUSE_PATH/2 > 0 0 $EXTERNAL_WAREHOUSE_PATH/call_center > {code} > After a few hours of digging, I guess this bug was introduced in HIVE-22158, > which creates every non-ACID managed table in the external warehouse > directory by default. In the example above, call_center is intended as a > managed table, but not explicitly specified as ACID. Hence, it should created > in the external warehouse directory. > However, the table call_center created in the external warehouse directory is > empty, while another non-empty table of the same name is created in the > warehouse directory. This is because in the current implementation, the > (buggy) compiled query plan proceeds as follows: > 1. Write data to a temporary directory > 2. Move the data to the warehouse directory ($WAREHOUSE_PATH/call_center) > 3. Create a table using data in the warehouse directory > Without the bug, step 2 would move the data to the external warehouse > directory, and step 3 would create a table using the data in the external > warehouse directory. The crux of the problem is that the query compiler > checks only whether the query does not include the keyword "external" or not. > In other words, the query compiler should also be aware of the changes made > in HIVE-22158 and updated accordingly. -- This message was sent by Atlassian Jira (v8.3.4#803005)