IIRC, the EIO we had also correlated with a SMART status that showed the
disk was bad enough for a warranty replacement -- so yes, I replaced the
disk in these cases.

Cheers, Dan

On Thu Nov 06 2014 at 2:44:08 PM GuangYang <yguan...@outlook.com> wrote:

> Thanks Dan. By "killed/formatted/replaced the OSD", did you replace the
> disk? Not an filesystem expert here, but would like to understand the
> underlying what happened behind the EIO and does that reveal something
> (e.g. hardware issue).
>
> In our case, we are using 6TB drive so that there are lot of data to
> migrate and as backfilling/recovering bring latency increasing, we hope to
> avoid that as much as we can..
>
> Thanks,
> Guang
>
> ________________________________
> > From: daniel.vanders...@cern.ch
> > Date: Thu, 6 Nov 2014 13:36:46 +0000
> > Subject: Re: PG inconsistency
> > To: yguan...@outlook.com; ceph-users@lists.ceph.com
> >
> > Hi,
> > I've only ever seen (1), EIO to read a file. In this case I've always
> > just killed / formatted / replaced that OSD completely -- that moves
> > the PG to a new master and the new replication "fixes" the
> > inconsistency. This way, I've never had to pg repair. I don't know if
> > this is a best or even good practise, but it works for us.
> > Cheers, Dan
> >
> > On Thu Nov 06 2014 at 2:24:32 PM GuangYang
> > <yguan...@outlook.com<mailto:yguan...@outlook.com>> wrote:
> > Hello Cephers,
> > Recently we observed a couple of inconsistencies in our Ceph cluster,
> > there were two major patterns leading to inconsistency as I observed:
> > 1) EIO to read the file, 2) the digest is inconsistent (for EC) even
> > there is no read error).
> >
> > While ceph has built-in tool sets to repair the inconsistencies, I also
> > would like to check with the community in terms of what is the best
> > ways to handle such issues (e.g. should we run fsck / xfs_repair when
> > such issue happens).
> >
> > In more details, I have the following questions:
> > 1. When there is inconsistency detected, what is the chance there is
> > some hardware issues which need to be repaired physically, or should I
> > run some disk/filesystem tools to further check?
> > 2. Should we use fsck / xfs_repair to fix the inconsistencies, or
> > should we solely relay on Ceph's repair tool sets?
> >
> > It would be great to hear you experience and suggestions.
> >
> > BTW, we are using XFS in the cluster.
> >
> > Thanks,
> > Guang
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to