+1 Anoop. Thats pretty much the only way right now if you need a custom balancing. This balancer doesn't have to live in the HMaster and can be invoked externally (there are caveats of doing that, when a RS die but works ok so far). A long term solution for your the problem you are trying to solve is HBASE-10576 by tweaking it a little.
cheers, esteban. -- Cloudera, Inc. On Wed, Apr 8, 2015 at 4:41 AM, Michael Segel <[email protected]> wrote: > Is your table staic? > > If you know your data and your ranges, you can do it. However as you add > data to the table, those regions will eventually split. > > The other issue that you brought up is that you want to do ‘local’ joins. > > Simple single word response… don’t. > > Longer response.. > > You’re suggesting that the tables in question share the row key in > common. Ok… why? Are they part of the same record? > How is the data normally being used? > > Have you looked at column families? > > The issue is that joins are expensive. What you’re suggesting is that as > you do a region scan, you’re going to the other table and then try to fetch > a row if it exists. > So its essentially for each row in the scan, try a get() which will almost > double the cost of your fetch. Then you have to decide how to do it > locally. Are you really going to write a coprocessor for this? (Hint: If > this is a common thing. Then either the second table should be part of the > first table in the same CF or as a separate CF. You need to rethink your > schema.) > > Does this make sense? > > > On Apr 7, 2015, at 7:05 PM, Demai Ni <[email protected]> wrote: > > > > hi, folks, > > > > I have a question about region assignment and like to clarify some > through. > > > > Let's say I have a table with rowkey as "row00000 ~ row30000" on a 4 node > > hbase cluster, is there a way to keep data partitioned by range on each > > node? for example: > > > > node1: <=row10000 > > node2: row10001~row20000 > > node3: row20001~row30000 > > node4: >row30000 > > > > And even when one of the node become hotspot, the boundary won't be > crossed > > unless manually doing a load balancing? > > > > I looked at presplit: { SPLITS => ['row100','row200','row300'] } , but > > don't think it serves this purpose. > > > > BTW, a bit background. I am thinking to do a local join between two > tables > > if both have same rowkey, and partitioned by range (or same hash > > algorithm). If I can keep the join-key on the same node(aka > regionServer), > > the join can be handled locally instead of broadcast to all other nodes. > > > > Thanks for your input. A couple pointers to blog/presentation would be > > appreciated. > > > > Demai > > The opinions expressed here are mine, while they may reflect a cognitive > thought, that is purely accidental. > Use at your own risk. > Michael Segel > michael_segel (AT) hotmail.com > > > > > >
