@Mikhail I wanted to split the table into groups of rows, but did not want to initialize a scan and go over all rows and group them into batches in the client code. In other words, I'm looking for a way to divide the rows in the table and merely maintain the boundary information of each division rather than actually populate them at the time of creation.
@Shahab yes, the row key ranges for the splits are not known in advance, which was why I was looking at retrieving the region information of the table and create the groupings that way. @Sean this was exactly what I was looking for. Based on the region boundaries, I should be able to create virtual groups of rows which can then be retrieved from the table (e.g. through a scan) on demand. Thanks everyone for your help. On 18 March 2015 at 00:57, Sean Busbey <[email protected]> wrote: > You should ask for a RegionLocator if you want to know the boundaries of > all the regions in a table > > > final Connection connection = ConnectionFactory.createConnection(config); > > try { > > final RegionLocator locator = > connection.getRegionLocator(TableName.valueOf("myTable")); > > final Pair<byte[][], byte[][]> startEndKeys = locator.getStartEndKeys(); > > final byte[][] startKeys = startEndKeys.getFirst(); > > final byte[][] endKeys = startEndKeys.getSecond(); > > for (int i=0; i < startKeys.length && i < endKeys.length; i++) { > > System.out.println("Region " + i + " starts at '" + > Bytes.toStringBinary(startKeys[i]) + > > "' and ends at '" + Bytes.toStringBinary(endKeys[i])); > > } > > } finally { > > connection.close(); > > } > > > There are other methods in RegionLocator if you need other details. > > On Tue, Mar 17, 2015 at 2:09 PM, Gokul Balakrishnan <[email protected]> > wrote: > > > Hi Michael, > > > > Thanks for the reply. Yes, I do realise that HBase has regions, perhaps > my > > usage of the term partitions was misleading. What I'm looking for is > > exactly what you've mentioned - a means of creating splits based on > > regions, without having to iterate over all rows in the table through the > > client API. Do you have any idea how I might achieve this? > > > > Thanks, > > > > On Tuesday, March 17, 2015, Michael Segel <[email protected]> > > wrote: > > > > > Hbase doesn't have partitions. It has regions. > > > > > > The split occurs against the regions so that if you have n regions, you > > > have n splits. > > > > > > Please don't confuse partitions and regions because they are not the > same > > > or synonymous. > > > > > > > On Mar 17, 2015, at 7:30 AM, Gokul Balakrishnan <[email protected] > > > <javascript:;>> wrote: > > > > > > > > Hi, > > > > > > > > My requirement is to partition an HBase Table and return a group of > > > records > > > > (i.e. rows having a specific format) without having to iterate over > all > > > of > > > > its rows. These partitions (which should ideally be along regions) > will > > > > eventually be sent to Spark but rather than use the HBase or Hadoop > > RDDs > > > > directly, I'll be using a custom RDD which recognizes partitions as > the > > > > aforementioned group of records. > > > > > > > > I was looking at achieving this through creating InputSplits through > > > > TableInputFormat.getSplits(), as being done in the HBase RDD [1] but > I > > > > can't figure out a way to do this without having access to the mapred > > > > context etc. > > > > > > > > Would greatly appreciate if someone could point me in the right > > > direction. > > > > > > > > [1] > > > > > > > > > > https://github.com/tmalaska/SparkOnHBase/blob/master/src/main/scala/com/cloudera/spark/hbase/HBaseScanRDD.scala > > > > > > > > Thanks, > > > > Gokul > > > > > > The opinions expressed here are mine, while they may reflect a > cognitive > > > thought, that is purely accidental. > > > Use at your own risk. > > > Michael Segel > > > michael_segel (AT) hotmail.com > > > > > > > > > > > > > > > > > > > > > > > > -- > Sean >
