Does the "repair" function use the same rules as a deep scrub? I couldn't
get one to kick off, until I temporarily increased the max_scrubs and
lowered the scrub_min_interval on all 3 OSDs for that placement group. This
ended up fixing the issue, so I'll leave this here in case somebody else
runs into it.

sudo ceph tell 'osd.208' injectargs '--osd_max_scrubs 3'
sudo ceph tell 'osd.120' injectargs '--osd_max_scrubs 3'
sudo ceph tell 'osd.235' injectargs '--osd_max_scrubs 3'
sudo ceph tell 'osd.208' injectargs '--osd_scrub_min_interval 1.0'
sudo ceph tell 'osd.120' injectargs '--osd_scrub_min_interval 1.0'
sudo ceph tell 'osd.235' injectargs '--osd_scrub_min_interval 1.0'
sudo ceph pg repair 75.302

-Brett


On Thu, Oct 11, 2018 at 8:42 AM Maks Kowalik <maks_kowa...@poczta.fm> wrote:

> Imho moving was not the best idea (a copying attempt would have told if
> the read error was the case here).
> Scrubs might don't want to start if there are many other scrubs ongoing.
>
> czw., 11 paź 2018 o 14:27 Brett Chancellor <bchancel...@salesforce.com>
> napisał(a):
>
>> I moved the file. But the cluster won't actually start any scrub/repair I
>> manually initiate.
>>
>> On Thu, Oct 11, 2018, 7:51 AM Maks Kowalik <maks_kowa...@poczta.fm>
>> wrote:
>>
>>> Based on the log output it looks like you're having a damaged file on
>>> OSD 235 where the shard is stored.
>>> To ensure if that's the case you should find the file (using
>>> 81d5654895863d as a part of its name) and try to copy it to another
>>> directory.
>>> If you get the I/O error while copying, the next steps would be to
>>> delete the file, run the scrub on 75.302 and take a deep look at the
>>> OSD.235 for any other errors.
>>>
>>> Kind regards,
>>> Maks
>>>
>>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to