[ https://issues.apache.org/jira/browse/HIVE-2869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shrijeet Paliwal updated HIVE-2869: ----------------------------------- Attachment: data_to_reproduce.tar.gz > Merging small files throws RuntimeException when hive.mergejob.maponly=false > ---------------------------------------------------------------------------- > > Key: HIVE-2869 > URL: https://issues.apache.org/jira/browse/HIVE-2869 > Project: Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 0.8.0 > Environment: CentOS release 5.5 (Final) > Reporter: Shrijeet Paliwal > Attachments: data_to_reproduce.tar.gz > > > Hive Version: Hive 0.8 (last commit SHA > b581a6192b8d4c544092679d05f45b2e50d42b45 ) > Hadoop version : chd3u0 > Trying to use the hive merge small file feature by setting all the necessary > params. > Have disabled use of CombineHiveInputFormat since my input is compressed > text. > {noformat} > hive> set mapred.min.split.size.per.node=1000000000; > hive> set mapred.min.split.size.per.rack=1000000000; > hive> set mapred.max.split.size=1000000000; > hive> set hive.merge.size.per.task=1000000000; > hive> set hive.merge.smallfiles.avgsize=1000000000; > hive> set hive.merge.size.smallfiles.avgsize=1000000000; > hive> set hive.merge.mapfiles=true; > hive> set hive.merge.mapredfiles=true; > hive> set hive.mergejob.maponly=false; > {noformat} > The plan decides to launch two MR jobs but after first job succeeds I get > runt time error > "java.lang.RuntimeException: Plan invalid, Reason: Reducers == 0 but reduce > operator specified" > *How to reproduce :* > * Creare tables as follows : > {code} > --create input table > create table tmp_notmerged ( > id int, > name string > ) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' > STORED AS TEXTFILE; > --create o/p table > create table tmp_merged ( > id int > ) > STORED AS TEXTFILE; > {code} > * Load data into tmp_notmerged (find files attached in with this jira) > * set knobs and fire hive query > {code} > set hive.merge.mapfiles=true; > set hive.mergejob.maponly=false; > insert overwrite table tmp_merged select id from tmp_notmerged; > {code} > * You should see error "java.lang.RuntimeException: Plan invalid, Reason: > Reducers == 0 but reduce operator specified" > *Proposed fix :* > Patch is here : https://gist.github.com/2025303 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira