[
https://issues.apache.org/jira/browse/HIVE-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ning Zhang updated HIVE-1806:
-----------------------------
Attachment: HIVE-1806.4.patch
getMergeSize() could return 0 when there are multiple files generated but each
file is an empty file. bucketmapjoin2.q actually hit this case where there are
multiple mappers/reducers and non of them produces results.
The comment about setNumberReducers is taken. The function does not only setup
number of reducers but also set up min split size. So I rename it to
setupMapRedWork. We cannot assume all merge tasks are map only because the
parameters hive.mergejob.maponly may be false and the compiler may generate a
MapReduce task. I also found a bug in GenMRFileSink1.createMergeJob() where the
hive.mergejob.maponly should be checked rather than hive.merge.mapfiles and
hive.merge.mapredfiles. The latter 2 are already checked before
createMergeJob() is called.
Uploading a new patch containing the change. I'm also running unit tests.
> The merge criteria on dynamic partitons should be per partiton
> --------------------------------------------------------------
>
> Key: HIVE-1806
> URL: https://issues.apache.org/jira/browse/HIVE-1806
> Project: Hive
> Issue Type: Bug
> Reporter: Ning Zhang
> Assignee: Ning Zhang
> Attachments: HIVE-1806.2.patch, HIVE-1806.3.patch, HIVE-1806.4.patch,
> HIVE-1806.patch
>
>
> Currently the criteria of whether a merge job should be fired on dynamic
> generated partitions are is the average file size of files across all dynamic
> partitions. It is very common that some dynamic partitions contains mostly
> large files and some contains mostly small files. Even though the average
> size of the total files are larger than the hive.merge.smallfiles.avgsize, we
> should merge those partitions containing small files only.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.