Re: [ceph-users] Is a scrub error (read_error) on a primary osd safe to repair?

Caspar Smit Thu, 05 Dec 2019 01:48:46 -0800

Konstantin,

Thanks for your answer, i will run a ceph pg repair.
Could you maybe elaborate globally how this repair process works? Does it
just try to re-read the read_error osd?
IIRC there was a time when a ceph pg repair wasn't considered 'safe'
because it just copied the primary osd shard contents to the other osd's.
Since when did this change?


btw, i woke up this morning with only 1 active+clean+inconsistent pg left
so one already triggered a new (deep) scrub and re-read the primary osd and
found it good.
I noticed these read_errors start to occur on this installation when
available RAM gets low (We still have to reboot the cluster nodes once in a
while to free up RAM).

Furthermore we will upgrade to 12.2.12 soon

Caspar Smit
Systemengineer
SuperNAS
Dorsvlegelstraat 13
1445 PA Purmerend

t: (+31) 299 410 414
e: caspars...@supernas.eu
w: www.supernas.eu


Op do 5 dec. 2019 om 07:26 schreef Konstantin Shalygin <k0...@k0ste.ru>:

> I tried to dig in the mailinglist archives but couldn't find a clear answer
> to the following situation:
>
> Ceph encountered a scrub error resulting in HEALTH_ERR
> Two PG's are active+clean+inconsistent. When investigating the PG i see a
> "read_error" on the primary OSD. Both PG's  are replicated PG's with 3
> copies.
>
> I'm on Luminous 12.2.5 on this installation, is it safe to just run "ceph
> pg repair" on those PG's or will it then overwrite the two good copies with
> the bad one from the primary?
> If the latter is true, what is the correct way to resolve this?
>
> Yes, you should call pg repair. Also It's better to upgrade to 12.2.12.
>
>
>
> k
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Is a scrub error (read_error) on a primary osd safe to repair?

Reply via email to