Re: [ceph-users] Disk failures

Krzysztof Nowicki Wed, 08 Jun 2016 11:50:15 -0700

Hi,

>From my own experience with failing HDDs I've seen cases where the drive
was failing silently initially. This manifested itself in repeated deep
scrub failures. Correct me if I'm wrong here, but Ceph keeps checksums of
data being written and in case that data is read back corrupted on one of
the OSDs this will be detected by scrub and reported as inconsistency. In
such cases automatic repair should be sufficient as having the checksums it
is possible to tell which copy is correct. In such case the OSD will not be
removed automatically and it's for the cluster administrator to get
suspicious in case such an inconsistency occurs repeatedly and remove the
OSD in question.


When the drive fails more severely and causes IO failures then the effect
will most likely be an abort of the OSD daemon which causes the relevant
OSD to go down. The cause of the abort can be determined by examining the
logs.

In any case SMART is your best friend and it is strongly advised to run
smartd in order to get early warnings.

Regards
Chris

wt., 7.06.2016 o 22:06 użytkownik Gandalf Corvotempesta <
gandalf.corvotempe...@gmail.com> napisał:

> Hi,
> How ceph detect and manage disk failures?  What happens if some data are
> wrote on a bad sector?
>
> Are there any change to get the bad sector "distributed" across the
> cluster due to the replication?
>
> Is ceph able to remove the OSD bound to the failed disk automatically?
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Disk failures

Reply via email to