ArrayIndexOutOfBoundsException while writing MapReduce output as RCFile

Krishnan K Mon, 21 Oct 2013 11:40:51 -0700

Hi All,

I have a scenario where I've to read an RCFile, process it and write the
output as an RCFile using a MapReduce program.
My Hadoop version is *CDH 4.2.1 *


*
*
*Mapper*
Map Input <Key,Value> = LongWritable, BytesRefArrayWritable
Map Output <Key,Value> = Text, BytesRefArrayWritable (Record)


*******************************CODE BEGINS*******************************
//Mapper

public static class AbcMapper extends Mapper<LongWritable,
BytesRefArrayWritable, Text, BytesRefArrayWritable>{

    public void map(LongWritable key, BytesRefArrayWritable value, Context
context) throws IOException, InterruptedException {
.........
// I am passing a text key and BytesRefArrayWritable value (record) as map
output.
     context.write(new Text(keys),value);
   }

//Reducer

public static class AbcReducer extends
Reducer<Text,BytesRefArrayWritable,Text,BytesRefArrayWritable> {

public void reduce(Text keyz, Iterable<BytesRefArrayWritable>
values, Context context) throws IOException, InterruptedException {

//Based on some logic, I pick one BytesRefArrayWritable record from the
list of BytesRefArrayWritable values obtained in the reduce input.

BytesRefArrayWritable outRecord= new BytesRefArrayWritable(5);

for (BytesRefArrayWritable val : values) {
  if (some condition)
  outRecord = val;
}
outRecord.size(); //Value here is getting logged as 5.
context.write(new Text(keyz), outRecord);
}

*******************************CODE ENDS*******************************
I've added the following in the main method :

job.setInputFormatClass(RCFileInputFormat.*class*);
 job.setOutputFormatClass (RCFileOutputFormat.*class* );
 job.setOutputValueClass (BytesRefArrayWritable.*class* );


Before writing to reduce output, the value of outRecord.size() is 5.

But still I'm getting ArrayIndexOutOfBoundsException.

*Stacktrace : *
*
*
*java.lang.ArrayIndexOutOfBoundsException: 0*
        at
org.apache.hadoop.hive.ql.io.RCFile$Writer.append(RCFile.java:890)
        at poc.RCFileOutputFormat$1.write(RCFileOutputFormat.java:82)
        at poc.RCFileOutputFormat$1.write(RCFileOutputFormat.java:1)
        at
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:551)
        at
org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
        at
org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99)
        at poc.Test$PurgeReducer.reduce(Purge.java:98)
        at poc.Test$AReducer.reduce(Purge.java:1)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
        at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)

I tried a lot to find which array was empty causing this exception
*java.lang.ArrayIndexOutOfBoundsException:
0, *but have not found anything yet

Could you please give me any pointers that will help me identify/resolve
the issue ?

Thanks!

ArrayIndexOutOfBoundsException while writing MapReduce output as RCFile

Reply via email to