Thu Nov 06 2014 at 16:44:09, GuangYang <yguan...@outlook.com>: > Thanks Dan. By "killed/formatted/replaced the OSD", did you replace the > disk? Not an filesystem expert here, but would like to understand the > underlying what happened behind the EIO and does that reveal something > (e.g. hardware issue). > > In our case, we are using 6TB drive so that there are lot of data to > migrate and as backfilling/recovering bring latency increasing, we hope to > avoid that as much as we can.. >
For example, use the following parameters: osd_recovery_delay_start = 10 osd recovery op priority = 2 osd max backfills = 1 osd recovery max active =1 osd recovery threads = 1 > > Thanks, > Guang > > ________________________________ > > From: daniel.vanders...@cern.ch > > Date: Thu, 6 Nov 2014 13:36:46 +0000 > > Subject: Re: PG inconsistency > > To: yguan...@outlook.com; ceph-users@lists.ceph.com > > > > Hi, > > I've only ever seen (1), EIO to read a file. In this case I've always > > just killed / formatted / replaced that OSD completely -- that moves > > the PG to a new master and the new replication "fixes" the > > inconsistency. This way, I've never had to pg repair. I don't know if > > this is a best or even good practise, but it works for us. > > Cheers, Dan > > > > On Thu Nov 06 2014 at 2:24:32 PM GuangYang > > <yguan...@outlook.com<mailto:yguan...@outlook.com>> wrote: > > Hello Cephers, > > Recently we observed a couple of inconsistencies in our Ceph cluster, > > there were two major patterns leading to inconsistency as I observed: > > 1) EIO to read the file, 2) the digest is inconsistent (for EC) even > > there is no read error). > > > > While ceph has built-in tool sets to repair the inconsistencies, I also > > would like to check with the community in terms of what is the best > > ways to handle such issues (e.g. should we run fsck / xfs_repair when > > such issue happens). > > > > In more details, I have the following questions: > > 1. When there is inconsistency detected, what is the chance there is > > some hardware issues which need to be repaired physically, or should I > > run some disk/filesystem tools to further check? > > 2. Should we use fsck / xfs_repair to fix the inconsistencies, or > > should we solely relay on Ceph's repair tool sets? > > > > It would be great to hear you experience and suggestions. > > > > BTW, we are using XFS in the cluster. > > > > Thanks, > > Guang > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com