Splitting up an HBase Table into partitions

Gokul Balakrishnan Tue, 17 Mar 2015 05:34:58 -0700

Hi,

My requirement is to partition an HBase Table and return a group of records
(i.e. rows having a specific format) without having to iterate over all of
its rows. These partitions (which should ideally be along regions) will
eventually be sent to Spark but rather than use the HBase or Hadoop RDDs
directly, I'll be using a custom RDD which recognizes partitions as the
aforementioned group of records.


I was looking at achieving this through creating InputSplits through
TableInputFormat.getSplits(), as being done in the HBase RDD [1] but I
can't figure out a way to do this without having access to the mapred
context etc.

Would greatly appreciate if someone could point me in the right direction.

[1]
https://github.com/tmalaska/SparkOnHBase/blob/master/src/main/scala/com/cloudera/spark/hbase/HBaseScanRDD.scala

Thanks,
Gokul

Splitting up an HBase Table into partitions

Reply via email to