[ https://issues.apache.org/jira/browse/HIVE-18206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16279772#comment-16279772 ]
Prasanth Jayachandran commented on HIVE-18206: ---------------------------------------------- I see makes sense. +1 > Merge of RC/ORC file should follow other fileformate which use merge > configuration parameter > -------------------------------------------------------------------------------------------- > > Key: HIVE-18206 > URL: https://issues.apache.org/jira/browse/HIVE-18206 > Project: Hive > Issue Type: New Feature > Affects Versions: 1.2.1, 2.1.1, 2.2.0, 3.0.0 > Reporter: Wang Haihua > Assignee: Wang Haihua > Attachments: HIVE-18206.1.patch, HIVE-18206.2.patch > > > Merge configuration parameter, like {{hive.merge.size.per.task}} , decide the > average file after merge stage. > But we found it only work for file format like {{Textfile/SequenceFile}}. > With {{RC/ORC}} file format, it {{does not work}}. > For {{RC/ORC}} file format we found the file size after merge stage, depends > on parameter like {{mapreduce.input.fileinputformat.split.maxsize}. > it is better to use {{hive.merge.size.per.task}} to decide the the average > file size for RC/ORC fileformat, which results in unifying. > Root Cause is for RC/ORC file format, merge class is {{MergeFileTask}} > instead of {{MapRedTask}} for Textfile/SequenceFile. And {{MergeFileTask}} > just has not accept the configuration value in MergeFileWork, so the solution > is passing it into {{MergeFileTask}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)