Re: [ceph-users] Fixing a HEALTH_ERR situation

Brett Chancellor Sat, 18 May 2019 15:52:06 -0700

It won't kick off right away if other deep scrubs are going to those OSDs.
You can set nodeep-scrub on the cluster, wait till your other deep scrubs
have finished, then turn deep scrubs back on and immediately run the
repair. You should see that pg do a deep scrubs then repair.


On Sat, May 18, 2019, 6:41 PM Jorge Garcia <jgar...@soe.ucsc.edu> wrote:

> I have tried ceph pg repair several times. It claims "instructing pg
> 2.798s0 on osd.41 to repair" but then nothing happens as far as I can tell.
> Any way of knowing if it's doing more?
>
> On Sat, May 18, 2019 at 3:33 PM Brett Chancellor <
> bchancel...@salesforce.com> wrote:
>
>> I would try the ceph pg repair. If you see the pg go into deep scrubbing,
>> then back to inconsistent you probably have a bad drive. Find which of the
>> drives in the pg are bad (pg query or go to the host and look through
>> dmesg). Take that osd offline and mark it out. Once backfill is complete,
>> it should clear up.
>>
>> On Sat, May 18, 2019, 6:05 PM Jorge Garcia <jgar...@soe.ucsc.edu> wrote:
>>
>>> We are testing a ceph cluster mostly using cephfs. We are using an
>>> erasure-code pool, and have been loading it up with data. Recently, we got
>>> a HEALTH_ERR response when we were querying the ceph status. We stopped all
>>> activity to the filesystem, and waited to see if the error would go away.
>>> It didn't. Then we tried a couple of suggestions from the internet (ceph pg
>>> repair, ceph pg scrub, ceph pg deep-scrub) to no avail. I'm not sure how to
>>> find out more information about what the problem is, and how to repair the
>>> filesystem to bring it back to normal health. Any suggestions?
>>>
>>> Current status:
>>>
>>> # ceph -s
>>>
>>>   cluster:
>>>
>>>     id:     28ef32f1-4350-491b-9003-b19b9c3a2076
>>>
>>>     health: HEALTH_ERR
>>>
>>>             5 scrub errors
>>>
>>>             Possible data damage: 1 pg inconsistent
>>>
>>>
>>>
>>>   services:
>>>
>>>     mon: 3 daemons, quorum gi-cba-01,gi-cba-02,gi-cba-03
>>>
>>>     mgr: gi-cba-01(active), standbys: gi-cba-02, gi-cba-03
>>>
>>>     mds: backups-1/1/1 up  {0=gi-cbmd=up:active}
>>>
>>>     osd: 87 osds: 87 up, 87 in
>>>
>>>
>>>
>>>   data:
>>>
>>>     pools:   2 pools, 4096 pgs
>>>
>>>     objects: 90.98 M objects, 134 TiB
>>>
>>>     usage:   210 TiB used, 845 TiB / 1.0 PiB avail
>>>
>>>     pgs:     4088 active+clean
>>>
>>>              5    active+clean+scrubbing+deep
>>>
>>>              2    active+clean+scrubbing
>>>
>>>              1    active+clean+inconsistent
>>>
>>> # ceph health detail
>>>
>>> HEALTH_ERR 5 scrub errors; Possible data damage: 1 pg inconsistent
>>>
>>> OSD_SCRUB_ERRORS 5 scrub errors
>>>
>>> PG_DAMAGED Possible data damage: 1 pg inconsistent
>>>
>>>     pg 2.798 is active+clean+inconsistent, acting [41,50,17,2,86,70,61]
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Fixing a HEALTH_ERR situation

Reply via email to