Re: How server gets into failed servers list?

Esteban Gutierrez Mon, 13 Apr 2015 11:01:52 -0700

Hi Dejan,

Do you have the logs from any of those failed region servers? Usually in
case of a critical failure the RS will shutdown itself or if the RS "hangs"
for a long time and the master will start processing the expiration of that
RS and reject the RS if it tries to reconnect with a YouAreDeadException.
The HBase master and RS logs for sure will tell us.


thanks,
esteban.


--
Cloudera, Inc.


On Mon, Apr 13, 2015 at 1:11 AM, Dejan Menges <[email protected]>
wrote:

> Hi,
>
> We had some issues recently with HDFS - hardware issue with one of the
> nodes, nodes died, HDFS recovered, but we figured out that something is
> wrong with HBase. Checking HMaster log, we saw that bunch of our region
> servers got to the famous failed servers list, and it was going on and on
> until we restarted every one of them.
>
> Are we doing something wrong? Is it possible somehow to tune this out, once
> the server is in this list to forget about it or something?
>
> Main question - how HMaster decides at all that server should be in the
> failed server list, and what does this means exactly?
>
> Was looking into HBase book, googling, but beside some generic answers
> wasn't able to find anything more internal.
>
> Thanks in advance!
>

Re: How server gets into failed servers list?

Reply via email to