Hi All, I have a scenario where I've to read an RCFile, process it and write the output as an RCFile using a MapReduce program. My Hadoop version is *CDH 4.2.1 *
* * *Mapper* Map Input <Key,Value> = LongWritable, BytesRefArrayWritable Map Output <Key,Value> = Text, BytesRefArrayWritable (Record) *******************************CODE BEGINS******************************* //Mapper public static class AbcMapper extends Mapper<LongWritable, BytesRefArrayWritable, Text, BytesRefArrayWritable>{ public void map(LongWritable key, BytesRefArrayWritable value, Context context) throws IOException, InterruptedException { ......... // I am passing a text key and BytesRefArrayWritable value (record) as map output. context.write(new Text(keys),value); } //Reducer public static class AbcReducer extends Reducer<Text,BytesRefArrayWritable,Text,BytesRefArrayWritable> { public void reduce(Text keyz, Iterable<BytesRefArrayWritable> values, Context context) throws IOException, InterruptedException { //Based on some logic, I pick one BytesRefArrayWritable record from the list of BytesRefArrayWritable values obtained in the reduce input. BytesRefArrayWritable outRecord= new BytesRefArrayWritable(5); for (BytesRefArrayWritable val : values) { if (some condition) outRecord = val; } outRecord.size(); //Value here is getting logged as 5. context.write(new Text(keyz), outRecord); } *******************************CODE ENDS******************************* I've added the following in the main method : job.setInputFormatClass(RCFileInputFormat.*class*); job.setOutputFormatClass (RCFileOutputFormat.*class* ); job.setOutputValueClass (BytesRefArrayWritable.*class* ); Before writing to reduce output, the value of outRecord.size() is 5. But still I'm getting ArrayIndexOutOfBoundsException. *Stacktrace : * * * *java.lang.ArrayIndexOutOfBoundsException: 0* at org.apache.hadoop.hive.ql.io.RCFile$Writer.append(RCFile.java:890) at poc.RCFileOutputFormat$1.write(RCFileOutputFormat.java:82) at poc.RCFileOutputFormat$1.write(RCFileOutputFormat.java:1) at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:551) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85) at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99) at poc.Test$PurgeReducer.reduce(Purge.java:98) at poc.Test$AReducer.reduce(Purge.java:1) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) I tried a lot to find which array was empty causing this exception *java.lang.ArrayIndexOutOfBoundsException: 0, *but have not found anything yet Could you please give me any pointers that will help me identify/resolve the issue ? Thanks!