Hi,
I am trying to do something like following in Spark:
JavaPairRDD<byte[], MyObject> eventRDD = hBaseRDD.map(new
PairFunction<Tuple2<ImmutableBytesWritable, Result>, byte[], MyObject >() {
@Override
public Tuple2<byte[], MyObject >
call(Tuple2<ImmutableBytesWritable, Result>
immutableBytesWritableResultTuple2) throws Exception {
return new
Tuple2<byte[], MyObject >(immutableBytesWritableResultTuple2._1.get(),
MyClass.get(immutableBytesWritableResultTuple2._2));
}
});
eventRDD.foreach(new VoidFunction<Tuple2<byte[], Event>>() {
@Override
public void call(Tuple2<byte[], Event> eventTuple2) throws
Exception {
processForEvent(eventTuple2._2);
}
});
processForEvent() function flow contains some processing and ultimately
writing to HBase Table. But I am getting serialisation issues with Hadoop
and HBase inbuilt classes. How do I solve this ? Does using Kyro
Serialisation help in this case ?
Thanks,
-Vibhor