[ 
https://issues.apache.org/jira/browse/HIVE-25837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yao Guangdong updated HIVE-25837:
---------------------------------
    Attachment: HIVE-25837.0001.patch

> Hive merge file operation may consume long time
> -----------------------------------------------
>
>                 Key: HIVE-25837
>                 URL: https://issues.apache.org/jira/browse/HIVE-25837
>             Project: Hive
>          Issue Type: Improvement
>          Components: Hive
>    Affects Versions: All Versions
>            Reporter: Yao Guangdong
>            Priority: Major
>         Attachments: HIVE-25837.0001.patch
>
>
>   It will cost very long time in some cases when we use hive merge files.This 
> is because we have thousands, even tens of thousands or more small files.But 
> this files is very small.Most of small files only have a little kb.The merge 
> file implement is only consider the target size(default 256M) at now.Which 
> make one map will merge thousands, even tens of thousands or more small 
> files.Which will cost too long time.
>   In this case,we change the code not only consider the targe size but also 
> care about the number of merge files per map(default 1024/map).Which may 
> cause the target files small than user's setting,but compare with the cost on 
> merge files i think user can accept it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to