[ 
https://issues.apache.org/jira/browse/HIVE-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12974749#action_12974749
 ] 

Ning Zhang commented on HIVE-1806:
----------------------------------

That's expected behavior. The merge job will check if the 
CombineHiveInputFormat is supported. If so it will use that for the merge job 
although the default hive.input.format is different. Setting input format for 
the merge job is done at GenMRFileSink1.java:375.

> The merge criteria on dynamic partitons should be per partiton
> --------------------------------------------------------------
>
>                 Key: HIVE-1806
>                 URL: https://issues.apache.org/jira/browse/HIVE-1806
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>         Attachments: HIVE-1806.2.patch, HIVE-1806.3.patch, HIVE-1806.4.patch, 
> HIVE-1806.patch
>
>
> Currently the criteria of whether a merge job should be fired on dynamic 
> generated partitions are is the average file size of files across all dynamic 
> partitions. It is very common that some dynamic partitions contains mostly 
> large files and some contains mostly small files. Even though the average 
> size of the total files are larger than the hive.merge.smallfiles.avgsize, we 
> should merge those partitions containing small files only. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to