[ https://issues.apache.org/jira/browse/HIVE-25277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644177#comment-17644177 ]
Goden Yao commented on HIVE-25277: ---------------------------------- is this going to 2.3.9 and 3.1.3 code line as well? fix version only indicates 4.0 > Slow Hive partition deletion for Cloud object stores with expensive ListFiles > ----------------------------------------------------------------------------- > > Key: HIVE-25277 > URL: https://issues.apache.org/jira/browse/HIVE-25277 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore > Affects Versions: All Versions > Reporter: Zhou Fang > Assignee: Zhou Fang > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-1 > > Time Spent: 4.5h > Remaining Estimate: 0h > > Deleting a Hive partition is slow when use a Cloud object store as the > warehouse for which ListFiles is expensive. A root cause is that the > recursive parent dir deletion is very inefficient: there are many duplicated > calls to isEmpty (ListFiles is called at the end). This fix sorts the parents > to delete according to the path size, and always processes the longest one > (e.g., a/b/c is always before a/b). As a result, each parent path is only > needed to be checked once. -- This message was sent by Atlassian Jira (v8.20.10#820010)