Hi,

My requirement is to partition an HBase Table and return a group of records
(i.e. rows having a specific format) without having to iterate over all of
its rows. These partitions (which should ideally be along regions) will
eventually be sent to Spark but rather than use the HBase or Hadoop RDDs
directly, I'll be using a custom RDD which recognizes partitions as the
aforementioned group of records.

I was looking at achieving this through creating InputSplits through
TableInputFormat.getSplits(), as being done in the HBase RDD [1] but I
can't figure out a way to do this without having access to the mapred
context etc.

Would greatly appreciate if someone could point me in the right direction.

[1]
https://github.com/tmalaska/SparkOnHBase/blob/master/src/main/scala/com/cloudera/spark/hbase/HBaseScanRDD.scala

Thanks,
Gokul

Reply via email to