Seems you did not set the number of columns (RCFileOutputFormat.setColumnNumber(Configuration conf, int columnNum)). Can you set it in your main method and see if your MR program works?
Thanks, Yin On Mon, Oct 21, 2013 at 2:38 PM, Krishnan K <kkrishna...@gmail.com> wrote: > Hi All, > > I have a scenario where I've to read an RCFile, process it and write the > output as an RCFile using a MapReduce program. > My Hadoop version is *CDH 4.2.1 * > > * > * > *Mapper* > Map Input <Key,Value> = LongWritable, BytesRefArrayWritable > Map Output <Key,Value> = Text, BytesRefArrayWritable (Record) > > > *******************************CODE BEGINS******************************* > //Mapper > > public static class AbcMapper extends Mapper<LongWritable, > BytesRefArrayWritable, Text, BytesRefArrayWritable>{ > > public void map(LongWritable key, BytesRefArrayWritable value, Context > context) throws IOException, InterruptedException { > ......... > // I am passing a text key and BytesRefArrayWritable value (record) as map > output. > context.write(new Text(keys),value); > } > > //Reducer > > public static class AbcReducer extends > Reducer<Text,BytesRefArrayWritable,Text,BytesRefArrayWritable> { > > public void reduce(Text keyz, Iterable<BytesRefArrayWritable> > values, Context context) throws IOException, InterruptedException { > > //Based on some logic, I pick one BytesRefArrayWritable record from the > list of BytesRefArrayWritable values obtained in the reduce input. > > BytesRefArrayWritable outRecord= new BytesRefArrayWritable(5); > > for (BytesRefArrayWritable val : values) { > if (some condition) > outRecord = val; > } > outRecord.size(); //Value here is getting logged as 5. > context.write(new Text(keyz), outRecord); > } > > *******************************CODE ENDS******************************* > I've added the following in the main method : > > job.setInputFormatClass(RCFileInputFormat.*class*); > job.setOutputFormatClass (RCFileOutputFormat.*class* ); > job.setOutputValueClass (BytesRefArrayWritable.*class* ); > > > Before writing to reduce output, the value of outRecord.size() is 5. > > But still I'm getting ArrayIndexOutOfBoundsException. > > *Stacktrace : * > * > * > *java.lang.ArrayIndexOutOfBoundsException: 0* > at > org.apache.hadoop.hive.ql.io.RCFile$Writer.append(RCFile.java:890) > at poc.RCFileOutputFormat$1.write(RCFileOutputFormat.java:82) > at poc.RCFileOutputFormat$1.write(RCFileOutputFormat.java:1) > at > org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:551) > at > org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85) > at > org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99) > at poc.Test$PurgeReducer.reduce(Purge.java:98) > at poc.Test$AReducer.reduce(Purge.java:1) > at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164) > at > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444) > at org.apache.hadoop.mapred.Child$4.run(Child.java:268) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > > I tried a lot to find which array was empty causing this exception > *java.lang.ArrayIndexOutOfBoundsException: > 0, *but have not found anything yet > > Could you please give me any pointers that will help me identify/resolve > the issue ? > > Thanks! >