[ 
https://issues.apache.org/jira/browse/HIVE-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12974713#action_12974713
 ] 

Namit Jain commented on HIVE-1806:
----------------------------------

Mostly looks good - a minor comment.


In the new test that you added, the merge job is a map-only job although you 
are using HiveInputFormat
This is because of the fact that you are using hadoop 20 which supports 
CombineHiveIF.
Do you think that is the correct behavior ? Looks OK, just wanted to confirm. 

> The merge criteria on dynamic partitons should be per partiton
> --------------------------------------------------------------
>
>                 Key: HIVE-1806
>                 URL: https://issues.apache.org/jira/browse/HIVE-1806
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>         Attachments: HIVE-1806.2.patch, HIVE-1806.3.patch, HIVE-1806.4.patch, 
> HIVE-1806.patch
>
>
> Currently the criteria of whether a merge job should be fired on dynamic 
> generated partitions are is the average file size of files across all dynamic 
> partitions. It is very common that some dynamic partitions contains mostly 
> large files and some contains mostly small files. Even though the average 
> size of the total files are larger than the hive.merge.smallfiles.avgsize, we 
> should merge those partitions containing small files only. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to