[ https://issues.apache.org/jira/browse/HIVE-20198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743514#comment-16743514 ]
Vihang Karajgaonkar commented on HIVE-20198: -------------------------------------------- This somehow slipped through the cracks. I will assign it myself and see if I can take a first stab at it. > Constant time table drops/renames > --------------------------------- > > Key: HIVE-20198 > URL: https://issues.apache.org/jira/browse/HIVE-20198 > Project: Hive > Issue Type: Improvement > Components: Metastore > Affects Versions: 4.0.0 > Reporter: Alexander Kolbasov > Priority: Major > > Currently table drops and table renames have O(P) performance (where P is the > number of partitions). When a managed table is deleted, the implementation > deletes table metadata and then deletes all partitions in HDFS. HDFS > operations are optimized and only do a sequential deletes for partitions > outside of table prefix. This operation is O(P)where Pis the number of > partitions. > Table rename goes through the list of partitions and modifies table name (and > potentially db name) in each partition. It also modifies each partition > location to match the new db/table name and renames directories (which is a > non-atomic and slow operation on S3). This is O(P) operation where P is the > number of partitions. > Basic idea is to do the following: > # Assign unique ID to each table > # Create directory name based on unique ID rather then the name > # Table rename then becomes metadata-only operation - there is no need to > change any location information. > # Table drop can become an asynchronous operation where the table is marked > as "deleted". Subsequent public metadata APIs should skip such tables. A > background cleaner thread may then go and clean up directories. > Since the table location is unique for each table, new tables will not reuse > existing locations. This change isn't compatible with the current behavior > where there is an assumption that table location is based on table name. We > can get around this by providing "opt-in" mechanism - special table property > that tells that the table can have such new behavior, so the improvement will > initially work for new tables created with this feature enabled. We may later > provide some tool to convert existing tables to the new scheme. > One complication is there in case where impersonation is enabled - the FS > operations should be performed using client UGI rather then server's, so the > cleaner thread should be able to use client UGIs. > Initially we can punt on this and do standard table drops when impersonation > is enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)