[ 
https://issues.apache.org/jira/browse/HIVE-27360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17724090#comment-17724090
 ] 

Ayush Saxena commented on HIVE-27360:
-------------------------------------

We shouldn’t create the managed location itself, when we aren’t using it. 
Iceberg table for hive is in general external only

> Iceberg: Don't create a new iceberg location if hms table already has a 
> default location 
> -----------------------------------------------------------------------------------------
>
>                 Key: HIVE-27360
>                 URL: https://issues.apache.org/jira/browse/HIVE-27360
>             Project: Hive
>          Issue Type: Improvement
>          Components: Iceberg integration
>            Reporter: zhangbutao
>            Assignee: zhangbutao
>            Priority: Major
>
> If you create a managed iceberg table without specifying the location and the 
> database has both location and managed_location, the final iceberg table 
> location will be on database location instead of managed_location. But you 
> can see a the database managed_location also has a iceberg table subdirectory 
> which is always here even if the table was dropped.
> We should ensure the managed iceberg table always on database 
> managed_location in case of database managed_location existing. The direct 
> and  simple way is we can use the created hms table location before 
> committing iceberg table to avoid creating a new iceberg location.
>  
> Step to repro:
> 1. set location and managed location properties:
>  
> {code:java}
> set hive.metastore.warehouse.dir=/user/hive/warehouse/hiveicetest;
> set hive.metastore.warehouse.external.dir= 
> /user/hive/warehouse/external/hiveicetest;
> {code}
> 2. create a database with default location and managed_location:
>  
> {code:java}
> create database testdb;{code}
>  
> {code:java}
> desc database testdb;{code}
>  
> {code:java}
> +----------+----------+----------------------------------------------------+----------------------------------------------------+-------------+-------------+-----------------+----------------+
> | db_name  | comment  |                      location                      |  
>                 managedlocation                   | owner_name  | owner_type  
> | connector_name  | remote_dbname  |
> +----------+----------+----------------------------------------------------+----------------------------------------------------+-------------+-------------+-----------------+----------------+
> | testdb   |          | 
> hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db | 
> hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db | hive        | USER      
>   |                   |
> +----------+----------+----------------------------------------------------+----------------------------------------------------+-------------+-------------+-----------------+----------------+
>  {code}
>  
>  
> 3. create a managed iceberg table without specifing the table location:
>  
> {code:java}
> // the table location will on: 
> hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01
> create table ice01 (id int) Stored by Iceberg stored as ORC;{code}
> but here you will find the two created location:
>  
> {code:java}
> hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01   //the 
> actual location which is used by the managed iceberg table
> hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db                  // a 
> empty managed location which is unused
> {code}
>  
> 4. drop the icebeg table
> you will find this unused managed location is still there:
> {code:java}
> hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db{code}
>  
>  
> We should use the created managed location to avoid creating a new iceberg 
> location.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to