[ 
https://issues.apache.org/jira/browse/HBASE-17963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved HBASE-17963.
--------------------------------
    Resolution: Incomplete

[[email protected]], I think this is a bit too vague to have any 
actionable development efforts attached to it. Discussions about how to fix a 
problem are best had on the mailing lists.

You might be interested in trying to tweak the value of 
{{hbase.master.balancer.stochastic.localityCost}} to a value like 400 or 500. 
This will instruct the balancer to make locality a more dominant factor in 
balancing your cluster. This would help a completely crashed cluster to get 
back to the "most data locality" state.

> RegionServers lose file locality on unplanned restart
> -----------------------------------------------------
>
>                 Key: HBASE-17963
>                 URL: https://issues.apache.org/jira/browse/HBASE-17963
>             Project: HBase
>          Issue Type: Bug
>          Components: hbase
>    Affects Versions: 1.1.2
>         Environment: Evident with HDP 2.4.3 running HBase 1.1.2
>            Reporter: Bjorn Olsen
>
> When an HBase cluster crashes, HFile locality is lost. 
> Crashes can happen for a variety of reasons, and in this event having a quick 
> time to recover (both data and database performance) is critical. 
> On cluster restore, region servers do not load their previous set of regions, 
> which means all HFiles must be moved around until locality is achieved again. 
> Performance is poor while file locality is not close to 100%. 
> A major compaction must be run to move the regions around, which further 
> impacts performance and will take longer the more data was in HBase at the 
> time of the crash.
> There is a graceful_stop script which is useful for planned outages - you can 
> first unload the regions from the region server, restart it, and then reload 
> the regions to the same server. No HFiles need to be moved and file locality 
> is quickly restored.
> However, with an unplanned outage, there is no locality kept of where the 
> regions were. On a crash HBase randomly assigns regions to region servers and 
> HFile locality is very low. We then need to move all the HFiles around until 
> file locality is restored.
> This is fine for a small number of regions and small HFiles but becomes 
> problematic when you have a large number of region servers or large files.
> This JIRA is a request to improve this behavior for unplanned outages by 
> trying to restore the regions assigned per server, after a cluster restart. 
> For example, HBase could keep a list of the region locality at regular 
> intervals, and use this as an initial guideline when regions are restarted. 
> Locality might still not be 100% immediately - but presumably better than 0%. 
> It would be necessary to first disable the load balancer (if enabled) while 
> this restore is happening and enable it afterward.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to