Merging small files throws RuntimeException when hive.mergejob.maponly=false
----------------------------------------------------------------------------

                 Key: HIVE-2869
                 URL: https://issues.apache.org/jira/browse/HIVE-2869
             Project: Hive
          Issue Type: Bug
          Components: Query Processor
    Affects Versions: 0.8.0
         Environment: CentOS release 5.5 (Final)
            Reporter: Shrijeet Paliwal
         Attachments: data_to_reproduce.tar.gz

Hive Version: Hive 0.8 (last commit SHA  
b581a6192b8d4c544092679d05f45b2e50d42b45 ) 
Hadoop version : chd3u0

Trying to use the hive merge small file feature by setting all the necessary 
params.
Have disabled use of CombineHiveInputFormat since my input is compressed text. 

{noformat}
hive> set mapred.min.split.size.per.node=1000000000;
hive> set mapred.min.split.size.per.rack=1000000000;
hive> set mapred.max.split.size=1000000000;
hive> set hive.merge.size.per.task=1000000000;
hive> set hive.merge.smallfiles.avgsize=1000000000;
hive> set hive.merge.size.smallfiles.avgsize=1000000000;
hive> set hive.merge.mapfiles=true;
hive> set hive.merge.mapredfiles=true;
hive> set hive.mergejob.maponly=false;
{noformat}

The plan decides to launch two MR jobs but after first job succeeds I get runt 
time error 
"java.lang.RuntimeException: Plan invalid, Reason: Reducers == 0 but reduce 
operator specified"

*How to reproduce :* 

* Creare tables as follows : 
{code}
--create input table
create table tmp_notmerged (
  id                int,
  name              string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE;


--create o/p table
create table tmp_merged (
  id                int
)
STORED AS TEXTFILE;
{code}

* Load data into tmp_notmerged (find files attached in with this jira)

* set knobs and fire hive query 
{code}
set hive.merge.mapfiles=true;
set hive.mergejob.maponly=false;
insert overwrite table tmp_merged select id from tmp_notmerged;
{code}

* You should see error "java.lang.RuntimeException: Plan invalid, Reason: 
Reducers == 0 but reduce operator specified"


*Proposed fix :*

Patch is here : https://gist.github.com/2025303

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to