Re: [ceph-users] HDD bad sector, pg inconsistent, no object remapping

Mihály Árva-Tóth Tue, 19 Nov 2013 02:07:22 -0800

Hello David and Chris,

Thank you for your replies in this thread.


>> The automatic repair should handle getting an EIO during read of the
object replica.

I think when osd tries to read object from primary disk, which inside of
bad sector, the controller does not respond with EIO but something else.

If you can help me how can I debug response code I try to know.

Thank you,
Mihaly


2013/11/19 David Zafman <david.zaf...@inktank.com>

>
> I looked at the code.  The automatic repair should handle getting an EIO
> during read of the object replica.  It does NOT require removing the object
> as I said before, so it doesn’t matter which copy has bad sectors.  It will
> copy from a good replica to the primary, if necessary.  By default a
> deep-scrub which would catch this case is performed weekly.  A repair must
> be initiated by administrative action.
>
> When replicas differ due to comparison of checksums, we currently don’t
> have a way to determine which copy(s) are corrupt.  This is where a manual
> intervention may be necessary if the administrator can determine which
> copy(s) are bad.
>
> David Zafman
> Senior Developer
> http://www.inktank.com
>
>
>
>
> On Nov 18, 2013, at 1:11 PM, Chris Dunlop <ch...@onthe.net.au> wrote:
>
> > OK, that's good (as far is it goes, being a manual process).
> >
> > So then, back to what I think was Mihály's original issue:
> >
> >> pg repair or deep-scrub can not fix this issue. But if I
> >> understand correctly, osd has to known it can not retrieve
> >> object from osd.0 and need to be replicate an another osd
> >> because there is no 3 working replicas now.
> >
> > Given a bad checksum and/or read error tells ceph that an object
> > is corrupt, it would seem to be a natural step to then have ceph
> > automatically use another good-checksum copy, and even rewrite
> > the corrupt object, either in normal operation or under a scub
> > or repair.
> >
> > Is there a reason this isn't done, apart from lack of tuits?
> >
> > Cheers,
> >
> > Chris
> >
> >
> > On Mon, Nov 18, 2013 at 11:43:46AM -0800, David Zafman wrote:
> >>
> >> No, you wouldn’t need to re-replicate the whole disk for a single bad
> sector.  The way to deal with that if the object is on the primary is to
> remove the file manually from the OSD’s filesystem and perform a repair of
> the PG that holds that object.  This will copy the object back from one of
> the replicas.
> >>
> >> David
> >>
> >> On Nov 17, 2013, at 10:46 PM, Chris Dunlop <ch...@onthe.net.au> wrote:
> >>
> >>> Hi David,
> >>>
> >>> On Fri, Nov 15, 2013 at 10:00:37AM -0800, David Zafman wrote:
> >>>>
> >>>> Replication does not occur until the OSD is “out.”  This creates a
> new mapping in the cluster of where the PGs should be and thus data begins
> to move and/or create sufficient copies.  This scheme lets you control how
> and when you want the replication to occur.  If you have plenty of space
> and you aren’t going to replace the drive immediately, just mark the OSD
> “down" AND “out.".  If you are going to replace the drive immediately, set
> the “noout” flag.  Take the OSD “down” and replace drive.  Assuming it is
> mounted in the same place as the bad drive, bring the OSD back up.  This
> will replicate exactly the same PGs the bad drive held back to the
> replacement drive.  As was stated before don’t forget to “ceph osd unset
> noout"
> >>>>
> >>>> Keep in mind that in the case of a machine that has a hardware
> failure and takes OSD(s) down there is an automatic timeout which will mark
> them “out" for unattended operation.  Unless you are monitoring the cluster
> 24/7 you should have enough disk space available to handle failures.
> >>>>
> >>>> Related info in:
> >>>>
> >>>>
> http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/
> >>>>
> >>>> David Zafman
> >>>> Senior Developer
> >>>> http://www.inktank.com
> >>>
> >>>
> >>> Are you saying, if a disk suffers from a bad sector in an object
> >>> for which it's primary, and for which good data exists on other
> >>> replica PGs, there's no way for ceph to recover other than by
> >>> (re-)replicating the whole disk?
> >>>
> >>> I.e., even if the disk is able to remap the bad sector using a
> >>> spare, so the disk is ok (albeit missing a sector's worth of
> >>> object data), the only way to recover is to basically blow away
> >>> all the data on that disk and start again, replicating
> >>> everything back to the disk (or to other disks)?
> >>>
> >>> Cheers,
> >>>
> >>> Chris.
>
>


-- 

Best regards,

Mihály Árva-Tóth

System Engineer



Virtual Call Center GmbH

Address: 23-33  Csalogány Street, Budapest 1027, Hungary

Tel: +36 1 999 7400

Mobile: +36 30 473 9256

Fax: +36 1 999 7401

E-mail: mihaly.arva-t...@virtual-call-center.eu

Web: www.virtual-call-center.eu <http://www.virtual-call-center.hu/>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] HDD bad sector, pg inconsistent, no object remapping

Reply via email to