liux created HIVE-28523:
---------------------------

             Summary: 删表或删分区时可能存在的性能问题
                 Key: HIVE-28523
                 URL: https://issues.apache.org/jira/browse/HIVE-28523
             Project: Hive
          Issue Type: Improvement
      Security Level: Public (Viewable by anyone)
          Components: Standalone Metastore
            Reporter: liux
            Assignee: liux


1. 删除表或者分区对象时的遍历可能存在性能问题

具体位置在:standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
 中的

for{color:#1f2328} ({color}{color:#1f2328}String{color}{color:#1f2328} 
{color}{color:#1f2328}partName{color}{color:#1f2328} : 
{color}{color:#1f2328}partNames{color}{color:#1f2328}) {{color}

{color:#1f2328}Path{color}{color:#1f2328} 
{color}{color:#1f2328}partPath{color}{color:#1f2328} = 
{color}{color:#1f2328}wh{color}{color:#1f2328}.{color}getDnsPath{color:#1f2328}({color}new{color:#1f2328}
 
{color}{color:#1f2328}Path{color}{color:#1f2328}({color}{color:#1f2328}pathString{color}{color:#1f2328}));{color}

}

假定wh.getDnsPath一次耗时在10毫秒左右,那么对于20w分区对象的遍历,耗时为33分钟,这可能导致删大表或分区超时;

2.没有必要在遍历所有分区名时都执行wh.getDnsPath(new Path(pathString))语句,只需要在分区非表下子目录的情况执行就够了



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to