Re: write availability

Esteban Gutierrez Tue, 07 Apr 2015 10:48:46 -0700

Hello Marcelo,

On Tue, Apr 7, 2015 at 10:16 AM, Marcelo Valle (BLOOMBERG/ LONDON) <
[email protected]> wrote:


> Esteban,
>
> If I understood correctly what you said:
>
> > "For the failure mode you mention if all DNs go down (not the NN)
> clients will be blocked waiting for the acknowledge of a write to the DNs
> and after few retries the RS will consider there was a failure writing to
> the WAL, the RS will attempt to roll the WAL for a last time and if fails
> at this point the RS will consider this as a fatal condition and it will
> shutdown it self. At this point the client probably ran out of retries and
> will throw an exception to the application."
>
>

> If this scenario happens, when will my application be available to accept
> writes for that region again?
>

It will be available to accept writes as soon as a HDFS pipeline can be
established to the DNs and before we fail to roll the WAL and mostly all
the settings be configured depending how fast you want to fail fast or
tolerate transient errors and recover.

>From my experience, I've seen that losing all the DNs is rare and when it
happens is due operator error. Also, depending on the access pattern to the
cluster many times those issues are undetected and things heal by itself
(HDFS placing more replicas, regions moving to other RSs, etc.)

When I do some manual intervention on the server?
>

HBase and HDFS usually recover automatically and if correctly configured
and you have the right cluster size for the workload. If clients have a
large number of retries or if you are using HA features like region
replicas as Nick mentioned clients will recover pretty much without any
intervention.


>
> For example: support I split data by user ids, so each user is stored in a
> different region. In the scenario above, my application (and also the HBase
> cluster) would be working for some users and wouldn't be working for users
> whose user id is in a "down region" (a region where all corresponding DNs
> are down, considering 1 DN per RS). Is this right?
>

Thats correct, during a short period of time (few seconds) clients will not
be able to contact the RS that went down while HBase recovers there regions
on that RS. Then this regions will be deployed on other RSs and the client
will obtain the new locations in the cluster and will continue to perform
reads and writes. With the read replicas that Nick mentioned the client can
tolerate even further this failure by accessing a copy of the data in a
region replica from another RS (might have some stale data) but for many
use cases it might be ok while the recovery of the primary replica
completes.

cheers,
esteban.


> -Marcelo.
>
> From: [email protected]
> Subject: Re: write availability
>
>
> Hello Marcelo,
>
> HBase has strong durability guarantees to avoid data loss. When a write
> arrives to a RegionServer data will be persisted into a Write-Ahead-Log (on
> HDFS) and temporarily in the RegionServer memory until the data from this
> memory store is flushed (also to HDFS).
>
> For the point of view of a client that is writing to HBase, if it
>  receives a response for a successful write operation (put, delete, append,
> increment) then we can guarantee that data was correctly persisted to HDFS
> in the WAL and in case of a catastrophic failure of a RegionServer we will
> be able to recover as others have mentioned.
>
> For the failure mode you mention if all DNs go down (not the NN) clients
> will be blocked waiting for the acknowledge of a write to the DNs and after
> few retries the RS will consider there was a failure writing to the WAL,
> the RS will attempt to roll the WAL for a last time and if fails at this
> point the RS will consider this as a fatal condition and it will shutdown
> it self. At this point the client probably ran out of retries and will
> throw an exception to the application.
>
> If a single DN can recover before any of the RSs goes down, the writes
> will recover and the client will get the acknowledge that data has been
> persisted to HDFS (even with a single DN at this point), during this period
> the RS logs will warn that data is getting persisted with a lower number of
> replicas and data could be at risk.
>
> If you are further interested in the write path in HBase there is a really
> good blog post from Jimmy Xiang about this topic:
> http://blog.cloudera.com/blog/2012/06/hbase-write-path
>
> best,
> esteban.
>
>
> --
> Cloudera, Inc.
>
>
> On Tue, Apr 7, 2015 at 9:04 AM, Marcelo Valle (BLOOMBERG/ LONDON) <
> [email protected]> wrote:
>
>> Wellington,
>>
>> I might be misinterpreting this:
>> http://stackoverflow.com/questions/13741946/role-of-datanode-regionserver-in-hbase-hadoop-integration
>>
>> But aren't HBase region servers and HDFS datanodes always in the same
>> server? With a replication factor of 3, what happens if all 3 datanodes
>> hosting that information go down and one of them come back, but with the
>> disk intact? Considering from the time they went down to the time it went
>> back HBase received new writes that would go to the same data node...
>>
>>
>> From: [email protected]
>> Subject: Re: write availability
>>
>> The data is stored on files on hdfs. If a RS goes down, the master knows
>> which regions were on that RS and which hdfs files contain data for these
>> regions, so it will just assign the regions to others RS, and these others
>> RS will have access to the regions data because it's stored on HDFS. The RS
>> does not "own" the disk, this is HDFS job, so the recovery on this case is
>> transparent.
>>
>>
>> On 7 Apr 2015, at 16:51, Marcelo Valle (BLOOMBERG/ LONDON) <
>> [email protected]> wrote:
>>
>> > So if a RS goes down, it's assumed you lost the data on it, right?
>> > HBase has replications on HDFS, so if a RS goes down it doesn't mean I
>> lost all the data, as I could have the replicas yet... But what happens if
>> all RS hosting a specific region goes down?
>> > What if one RS from this one comes back again, but with the disk
>> intact, with all the data it had before crashing?
>> >
>> >
>> > From: [email protected]
>> > Subject: Re: write availability
>> >
>> > When a RS goes down, the Master will try to assign the regions on the
>> remaining RSes. When the RS comes back, after a while, the Master balancer
>> process will re-distribute regions between RS, so the given RS will be
>> hosting regions, but not necessarily the one it used to host before it went
>> down.
>> >
>> >
>> > On 7 Apr 2015, at 16:31, Marcelo Valle (BLOOMBERG/ LONDON) <
>> [email protected]> wrote:
>> >
>> >>> So if the cluster is up, then you can insert records in to HBase even
>> though you lost a RS that was handing a specific region.
>> >>
>> >> What happens when the RS goes down? Writes to that region will be
>> written to another region server? Another RS assumes the region "range"
>> while the RS is down?
>> >>
>> >> What happens when the RS that was down goes up again?
>> >>
>> >>
>> >> From: [email protected]
>> >> Subject: Re: write availability
>> >>
>> >> I don’t know if I would say that…
>> >>
>> >> I read Marcelo’s question of “if the cluster is up, even though a RS
>> may be down, can I still insert records in to HBase?”
>> >>
>> >> So if the cluster is up, then you can insert records in to HBase even
>> though you lost a RS that was handing a specific region.
>> >>
>> >> But because he talked about syncing nodes… I could be misreading his
>> initial question…
>> >>
>> >>> On Apr 7, 2015, at 9:02 AM, Serega Sheypak <[email protected]>
>> wrote:
>> >>>
>> >>>> If I have an application that writes to a HBase cluster, can I count
>> that
>> >>> the cluster will always available to receive writes?
>> >>> No, it's CP, not AP system.
>> >>>> so everything get in sync when the other nodes get up again
>> >>> There is no hinted backoff, It's not Cassandra.
>> >>>
>> >>>
>> >>>
>> >>> 2015-04-07 14:48 GMT+02:00 Marcelo Valle (BLOOMBERG/ LONDON) <
>> >>> [email protected]>:
>> >>>
>> >>>> If I have an application that writes to a HBase cluster, can I count
>> that
>> >>>> the cluster will always available to receive writes?
>> >>>> I might not be able to read if a region server which handles a range
>> of
>> >>>> keys is down, but will I be able to keep writing to other nodes, so
>> >>>> everything get in sync when the other nodes get up again?
>> >>>> Or I might get no write availability for a while?
>> >>
>> >> The opinions expressed here are mine, while they may reflect a
>> cognitive thought, that is purely accidental.
>> >> Use at your own risk.
>> >> Michael Segel
>> >> michael_segel (AT) hotmail.com
>> >
>> >
>>
>>
>>
>
>

Re: write availability

Reply via email to