[ https://issues.apache.org/jira/browse/HIVE-25837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yao Guangdong updated HIVE-25837: --------------------------------- Attachment: HIVE-25837.0001.patch > Hive merge file operation may consume long time > ----------------------------------------------- > > Key: HIVE-25837 > URL: https://issues.apache.org/jira/browse/HIVE-25837 > Project: Hive > Issue Type: Improvement > Components: Hive > Affects Versions: All Versions > Reporter: Yao Guangdong > Priority: Major > Attachments: HIVE-25837.0001.patch > > > It will cost very long time in some cases when we use hive merge files.This > is because we have thousands, even tens of thousands or more small files.But > this files is very small.Most of small files only have a little kb.The merge > file implement is only consider the target size(default 256M) at now.Which > make one map will merge thousands, even tens of thousands or more small > files.Which will cost too long time. > In this case,we change the code not only consider the targe size but also > care about the number of merge files per map(default 1024/map).Which may > cause the target files small than user's setting,but compare with the cost on > merge files i think user can accept it. -- This message was sent by Atlassian Jira (v8.20.1#820001)