[ 
https://issues.apache.org/jira/browse/HIVE-27360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17724112#comment-17724112
 ] 

Ayush Saxena commented on HIVE-27360:
-------------------------------------

If that managed location table is not used there is no point creating it, we 
should do isIceberg kind of checks and skip creating an empty directory in 
managed warehouse location.

Iceberg table is always external, if you don't specify external, it would be 
still external but with purge flag set to true and set TRANSLATED_TO_EXTERNAL 
as true, check MetastoreDefaultTransformer

[~dkuzmenko] any suggestions?

> Iceberg: Don't create a new iceberg location if hms table already has a 
> default location 
> -----------------------------------------------------------------------------------------
>
>                 Key: HIVE-27360
>                 URL: https://issues.apache.org/jira/browse/HIVE-27360
>             Project: Hive
>          Issue Type: Improvement
>          Components: Iceberg integration
>            Reporter: zhangbutao
>            Assignee: zhangbutao
>            Priority: Major
>
> If you create a managed iceberg table without specifying the location and the 
> database has both location and managed_location, the final iceberg table 
> location will be on database location instead of managed_location. But you 
> can see a the database managed_location also has a iceberg table subdirectory 
> which is always here even if the table was dropped.
> We should ensure the managed iceberg table always on database 
> managed_location in case of database managed_location existing. The direct 
> and  simple way is we can use the created hms table location before 
> committing iceberg table to avoid creating a new iceberg location.
>  
> Step to repro:
> 1. set location and managed location properties:
>  
> {code:java}
> set hive.metastore.warehouse.dir=/user/hive/warehouse/hiveicetest;
> set hive.metastore.warehouse.external.dir= 
> /user/hive/warehouse/external/hiveicetest;
> {code}
> 2. create a database with default location and managed_location:
>  
> {code:java}
> create database testdb;{code}
>  
> {code:java}
> desc database testdb;{code}
>  
> {code:java}
> +----------+----------+----------------------------------------------------+----------------------------------------------------+-------------+-------------+-----------------+----------------+
> | db_name  | comment  |                      location                      |  
>                 managedlocation                   | owner_name  | owner_type  
> | connector_name  | remote_dbname  |
> +----------+----------+----------------------------------------------------+----------------------------------------------------+-------------+-------------+-----------------+----------------+
> | testdb   |          | 
> hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db | 
> hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db | hive        | USER      
>   |                   |
> +----------+----------+----------------------------------------------------+----------------------------------------------------+-------------+-------------+-----------------+----------------+
>  {code}
>  
>  
> 3. create a managed iceberg table without specifing the table location:
>  
> {code:java}
> // the table location will on: 
> hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01
> create table ice01 (id int) Stored by Iceberg stored as ORC;{code}
> but here you will find the two created location:
>  
> {code:java}
> hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01   //the 
> actual location which is used by the managed iceberg table
> hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db/ice01            // a 
> empty managed location which is unused
> {code}
>  
> 4. drop the icebeg table
> you will find this unused managed location is still there:
> {code:java}
> hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db{code}
>  
>  
> We should use the created managed location to avoid creating a new iceberg 
> location.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to