Merging small files throws RuntimeException when hive.mergejob.maponly=false ----------------------------------------------------------------------------
Key: HIVE-2869 URL: https://issues.apache.org/jira/browse/HIVE-2869 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.8.0 Environment: CentOS release 5.5 (Final) Reporter: Shrijeet Paliwal Attachments: data_to_reproduce.tar.gz Hive Version: Hive 0.8 (last commit SHA b581a6192b8d4c544092679d05f45b2e50d42b45 ) Hadoop version : chd3u0 Trying to use the hive merge small file feature by setting all the necessary params. Have disabled use of CombineHiveInputFormat since my input is compressed text. {noformat} hive> set mapred.min.split.size.per.node=1000000000; hive> set mapred.min.split.size.per.rack=1000000000; hive> set mapred.max.split.size=1000000000; hive> set hive.merge.size.per.task=1000000000; hive> set hive.merge.smallfiles.avgsize=1000000000; hive> set hive.merge.size.smallfiles.avgsize=1000000000; hive> set hive.merge.mapfiles=true; hive> set hive.merge.mapredfiles=true; hive> set hive.mergejob.maponly=false; {noformat} The plan decides to launch two MR jobs but after first job succeeds I get runt time error "java.lang.RuntimeException: Plan invalid, Reason: Reducers == 0 but reduce operator specified" *How to reproduce :* * Creare tables as follows : {code} --create input table create table tmp_notmerged ( id int, name string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE; --create o/p table create table tmp_merged ( id int ) STORED AS TEXTFILE; {code} * Load data into tmp_notmerged (find files attached in with this jira) * set knobs and fire hive query {code} set hive.merge.mapfiles=true; set hive.mergejob.maponly=false; insert overwrite table tmp_merged select id from tmp_notmerged; {code} * You should see error "java.lang.RuntimeException: Plan invalid, Reason: Reducers == 0 but reduce operator specified" *Proposed fix :* Patch is here : https://gist.github.com/2025303 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira