Hi, My requirement is to partition an HBase Table and return a group of records (i.e. rows having a specific format) without having to iterate over all of its rows. These partitions (which should ideally be along regions) will eventually be sent to Spark but rather than use the HBase or Hadoop RDDs directly, I'll be using a custom RDD which recognizes partitions as the aforementioned group of records.
I was looking at achieving this through creating InputSplits through TableInputFormat.getSplits(), as being done in the HBase RDD [1] but I can't figure out a way to do this without having access to the mapred context etc. Would greatly appreciate if someone could point me in the right direction. [1] https://github.com/tmalaska/SparkOnHBase/blob/master/src/main/scala/com/cloudera/spark/hbase/HBaseScanRDD.scala Thanks, Gokul
