[ https://issues.apache.org/jira/browse/HIVE-18206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Work on HIVE-18206 started by Wang Haihua. ------------------------------------------ > Merge of RC/ORC file should follow other fileformate which use merge > configuration parameter > -------------------------------------------------------------------------------------------- > > Key: HIVE-18206 > URL: https://issues.apache.org/jira/browse/HIVE-18206 > Project: Hive > Issue Type: New Feature > Reporter: Wang Haihua > Assignee: Wang Haihua > > Merge configuration parameter, like {{hive.merge.size.per.task}} , decide the > average file after merge stage. > But we found it only work for file format like {{Textfile/SequenceFile}}. > With {{RC/ORC}} file format, it {{does not work}}. > For {{RC/ORC}} file format, we found instead the file size after merge stage, > depends on parameter like {{mapreduce.input.fileinputformat.split.maxsize}. > it is better to use {{hive.merge.size.per.task}} to decide the the average > file size for RC/ORC fileformat, which results in unifying. > Root Cause is for RC/ORC file format, merge class is {{MergeFileTask}} > instead of {{MapRedTask}} for Textfile/SequenceFile. And {{MergeFileTask}} > just has not accept the configuration value in MergeFileWork, so the solution > is passing it into {{MergeFileTask}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)