> On Mar 18, 2015, at 1:52 AM, Gokul Balakrishnan <[email protected]> wrote: > > > > @Sean this was exactly what I was looking for. Based on the region > boundaries, I should be able to create virtual groups of rows which can > then be retrieved from the table (e.g. through a scan) on demand. >
Huh? You don’t need to do this. Its already done for you by the existing APIs. A scan will allow you to do either a full table scan (no range limits provided) or a range scan where you provide the boundaries. So if you’re using a client connection to HBase, its done for you. If you’re writing a M/R job, you are already getting one mapper task assigned per region. So your parallelism is already done for you. Its possible that the Input Format is smart enough to pre-check the regions to see if they are within the boundaries or not and if not, no mapper task is generated. HTH -Mike > Thanks everyone for your help. > > On 18 March 2015 at 00:57, Sean Busbey <[email protected]> wrote: > >> You should ask for a RegionLocator if you want to know the boundaries of >> all the regions in a table >> >> >> final Connection connection = ConnectionFactory.createConnection(config); >> >> try { >> >> final RegionLocator locator = >> connection.getRegionLocator(TableName.valueOf("myTable")); >> >> final Pair<byte[][], byte[][]> startEndKeys = locator.getStartEndKeys(); >> >> final byte[][] startKeys = startEndKeys.getFirst(); >> >> final byte[][] endKeys = startEndKeys.getSecond(); >> >> for (int i=0; i < startKeys.length && i < endKeys.length; i++) { >> >> System.out.println("Region " + i + " starts at '" + >> Bytes.toStringBinary(startKeys[i]) + >> >> "' and ends at '" + Bytes.toStringBinary(endKeys[i])); >> >> } >> >> } finally { >> >> connection.close(); >> >> } >> >> >> There are other methods in RegionLocator if you need other details. >> >> On Tue, Mar 17, 2015 at 2:09 PM, Gokul Balakrishnan <[email protected]> >> wrote: >> >>> Hi Michael, >>> >>> Thanks for the reply. Yes, I do realise that HBase has regions, perhaps >> my >>> usage of the term partitions was misleading. What I'm looking for is >>> exactly what you've mentioned - a means of creating splits based on >>> regions, without having to iterate over all rows in the table through the >>> client API. Do you have any idea how I might achieve this? >>> >>> Thanks, >>> >>> On Tuesday, March 17, 2015, Michael Segel <[email protected]> >>> wrote: >>> >>>> Hbase doesn't have partitions. It has regions. >>>> >>>> The split occurs against the regions so that if you have n regions, you >>>> have n splits. >>>> >>>> Please don't confuse partitions and regions because they are not the >> same >>>> or synonymous. >>>> >>>>> On Mar 17, 2015, at 7:30 AM, Gokul Balakrishnan <[email protected] >>>> <javascript:;>> wrote: >>>>> >>>>> Hi, >>>>> >>>>> My requirement is to partition an HBase Table and return a group of >>>> records >>>>> (i.e. rows having a specific format) without having to iterate over >> all >>>> of >>>>> its rows. These partitions (which should ideally be along regions) >> will >>>>> eventually be sent to Spark but rather than use the HBase or Hadoop >>> RDDs >>>>> directly, I'll be using a custom RDD which recognizes partitions as >> the >>>>> aforementioned group of records. >>>>> >>>>> I was looking at achieving this through creating InputSplits through >>>>> TableInputFormat.getSplits(), as being done in the HBase RDD [1] but >> I >>>>> can't figure out a way to do this without having access to the mapred >>>>> context etc. >>>>> >>>>> Would greatly appreciate if someone could point me in the right >>>> direction. >>>>> >>>>> [1] >>>>> >>>> >>> >> https://github.com/tmalaska/SparkOnHBase/blob/master/src/main/scala/com/cloudera/spark/hbase/HBaseScanRDD.scala >>>>> >>>>> Thanks, >>>>> Gokul >>>> >>>> The opinions expressed here are mine, while they may reflect a >> cognitive >>>> thought, that is purely accidental. >>>> Use at your own risk. >>>> Michael Segel >>>> michael_segel (AT) hotmail.com >>>> >>>> >>>> >>>> >>>> >>>> >>> >> >> >> >> -- >> Sean >> The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segel michael_segel (AT) hotmail.com
