Re: introducing nodes w/ more storage

Michael Segel Fri, 03 Apr 2015 11:30:33 -0700

I don’t know that it is such a good idea. 

Let me ask it this way…


What are you balancing with the HBase load balancer? 
Locations of HFiles on HDFS or which RS is responsible for the HFile? 

-Mike

> On Apr 2, 2015, at 12:42 PM, lars hofhansl <[email protected]> wrote:
> 
> What Kevin says.
> The best we can do is exclude the HBase from the HDFS balancer (HDF 
> S-6133).The HDFS balancer will destroy data locality for HBase. If you don't 
> care - maybe you have a fat network tree, and your network bandwidth matches 
> the aggregate disk throughput for each machine - you can run it. Even then as 
> Kevin says, HBase will just happily rewrite it as before.
> 
> Balancing of HBase data has to happen on the HBase level. Then we have to 
> decide what we use as a basis for distribution.CPU? RAM? disk space? IOPs? 
> disk throughput? It depends... So some configurable function of those.
> -- Lars
> 
>      From: Kevin O'dell <[email protected]>
> To: "[email protected]" <[email protected]> 
> Cc: lars hofhansl <[email protected]> 
> Sent: Thursday, April 2, 2015 5:41 AM
> Subject: Re: introducing nodes w/ more storage
> 
> Hi Mike,
>   Sorry for the delay here.  
> How does the HDFS load balancer impact the load balancing of HBase? <-- The 
> HDFS load balancer is not automatically run, it is a manual process that is 
> kicked off. It is not recommended to *ever run the HDFS balancer on a cluster 
> running HBase.  Similar to have HBase has no concept or care about the 
> underlying storage, HDFS has no concept or care of the region layout, nor the 
> locality we worked so hard to build through compactions. 
> 
> Furthermore, once the HDFS balancer has saved us from running out of space on 
> the smaller nodes, we will run a major compaction, and re-write all of the 
> HBase data right back to where it was before.
> one is the number of regions managed by a region server that’s HBase’s load, 
> right? And then there’s the data distribution of HBase files that is really 
> managed by HDFS load balancer, right? <--- Right, until we run major 
> compaction and "restore" locality by moving the data back
> 
> Even still… eventually the data will be distributed equally across the 
> cluster. What’s happening with the HDFS balancer?  Is that heterogenous or 
> homogenous in terms of storage? <-- Not quite, as I said before the HDFS 
> balancer is manual, so it is quite easy to build up a skew, especially if you 
> use a datanode as an edge node or thrift gateway etc.  Yes, the HDFS balancer 
> is heterogenous, but it doesn't play nice with HBase.
> 
> *The use of the word ever should not be construed as a true definitive.  Ever 
> is being used to represent a best practice.  In many cases the HDFS balancer 
> needs to be run, especially in multi-tenant clusters with archive data.  It 
> is best to immediately run a major compaction to restore HBase locality if 
> the HDFS balancer is used.
> 
> 
> On Mon, Mar 23, 2015 at 10:50 AM, Michael Segel <[email protected]> 
> wrote:
> 
> @lars,
> 
> How does the HDFS load balancer impact the load balancing of HBase?
> 
> Of course there are two loads… one is the number of regions managed by a 
> region server that’s HBase’s load, right?
> And then there’s the data distribution of HBase files that is really managed 
> by HDFS load balancer, right?
> 
> OP’s question is having a heterogenous cluster where he would like to see a 
> more even distribution of data/free space based on the capacity of the newer 
> machines in the cluster.
> 
> This is a storage question, not a memory/cpu core question.
> 
> Or am I missing something?
> 
> 
> -Mike
> 
>> On Mar 22, 2015, at 10:56 PM, lars hofhansl <[email protected]> wrote:
>> 
>> Seems that it should not be too hard to add that to the stochastic load 
>> balancer.
>> We could add a spaceCost or something.
>> 
>> 
>> 
>> ----- Original Message -----
>> From: Jean-Marc Spaggiari <[email protected]>
>> To: user <[email protected]>
>> Cc: Development <[email protected]>
>> Sent: Thursday, March 19, 2015 12:55 PM
>> Subject: Re: introducing nodes w/ more storage
>> 
>> You can extend the default balancer and assign the regions based on
>> that.But at the end, the replicated blocks might still go all over the
>> cluster and your "small" nodes are going to be full and will not be able to
>> get anymore writes even for the regions they are supposed to get.
>> 
>> I'm not sure there is a good solution for what you are looking for :(
>> 
>> I build my own balancer but because of differences in the CPUs, not because
>> of differences of the storage space...
>> 
>> 
>> 2015-03-19 15:50 GMT-04:00 Nick Dimiduk <[email protected]>:
>> 
>>> Seems more fantasy than fact, I'm afraid. The default load balancer [0]
>>> takes store file size into account, but has no concept of capacity. It
>>> doesn't know that nodes in a heterogenous environment have different
>>> capacity.
>>> 
>>> This would be a good feature to add though.
>>> 
>>> [0]:
>>> 
>>> https://github.com/apache/hbase/blob/branch-1.0/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java
>>> 
>>> On Tue, Mar 17, 2015 at 7:26 AM, Ted Tuttle <[email protected]> wrote:
>>> 
>>>> Hello-
>>>> 
>>>> Sometime back I asked a question about introducing new nodes w/ more
>>>> storage that existing nodes.  I was told at the time that HBase will not
>>> be
>>>> able to utilize the additional storage; I assumed at the time that
>>> regions
>>>> are allocated to nodes in something like a round-robin fashion and the
>>> node
>>>> with the least storage sets the limit for how much each node can utilize.
>>>> 
>>>> My question this time around has to do with nodes w/ unequal numbers of
>>>> volumes: Does HBase allocate regions based on nodes or volumes on the
>>>> nodes?  I am hoping I can add a node with 8 volumes totaling 8X TB and
>>> all
>>>> the volumes will be filled.  This even though legacy nodes have 5 volumes
>>>> and total storage of 5X TB.
>>>> 
>>>> Fact or fantasy?
>>>> 
>>>> Thanks,
>>>> Ted
>>>> 
>>>> 
>>> 
>> 
> 
> The opinions expressed here are mine, while they may reflect a cognitive 
> thought, that is purely accidental.
> Use at your own risk.
> Michael Segel
> michael_segel (AT) hotmail.com
> 
> 
> 
> 
> 
> 
> 
> 
> 
> -- 
> Kevin O'Dell
> Field Enablement, Cloudera
> 

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: introducing nodes w/ more storage

Reply via email to