[ https://issues.apache.org/jira/browse/HIVE-27360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17724090#comment-17724090 ]
Ayush Saxena commented on HIVE-27360: ------------------------------------- We shouldn’t create the managed location itself, when we aren’t using it. Iceberg table for hive is in general external only > Iceberg: Don't create a new iceberg location if hms table already has a > default location > ----------------------------------------------------------------------------------------- > > Key: HIVE-27360 > URL: https://issues.apache.org/jira/browse/HIVE-27360 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration > Reporter: zhangbutao > Assignee: zhangbutao > Priority: Major > > If you create a managed iceberg table without specifying the location and the > database has both location and managed_location, the final iceberg table > location will be on database location instead of managed_location. But you > can see a the database managed_location also has a iceberg table subdirectory > which is always here even if the table was dropped. > We should ensure the managed iceberg table always on database > managed_location in case of database managed_location existing. The direct > and simple way is we can use the created hms table location before > committing iceberg table to avoid creating a new iceberg location. > > Step to repro: > 1. set location and managed location properties: > > {code:java} > set hive.metastore.warehouse.dir=/user/hive/warehouse/hiveicetest; > set hive.metastore.warehouse.external.dir= > /user/hive/warehouse/external/hiveicetest; > {code} > 2. create a database with default location and managed_location: > > {code:java} > create database testdb;{code} > > {code:java} > desc database testdb;{code} > > {code:java} > +----------+----------+----------------------------------------------------+----------------------------------------------------+-------------+-------------+-----------------+----------------+ > | db_name | comment | location | > managedlocation | owner_name | owner_type > | connector_name | remote_dbname | > +----------+----------+----------------------------------------------------+----------------------------------------------------+-------------+-------------+-----------------+----------------+ > | testdb | | > hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db | > hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db | hive | USER > | | > +----------+----------+----------------------------------------------------+----------------------------------------------------+-------------+-------------+-----------------+----------------+ > {code} > > > 3. create a managed iceberg table without specifing the table location: > > {code:java} > // the table location will on: > hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01 > create table ice01 (id int) Stored by Iceberg stored as ORC;{code} > but here you will find the two created location: > > {code:java} > hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01 //the > actual location which is used by the managed iceberg table > hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db // a > empty managed location which is unused > {code} > > 4. drop the icebeg table > you will find this unused managed location is still there: > {code:java} > hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db{code} > > > We should use the created managed location to avoid creating a new iceberg > location. > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)