Konstantin, Thanks for your answer, i will run a ceph pg repair. Could you maybe elaborate globally how this repair process works? Does it just try to re-read the read_error osd? IIRC there was a time when a ceph pg repair wasn't considered 'safe' because it just copied the primary osd shard contents to the other osd's. Since when did this change?
btw, i woke up this morning with only 1 active+clean+inconsistent pg left so one already triggered a new (deep) scrub and re-read the primary osd and found it good. I noticed these read_errors start to occur on this installation when available RAM gets low (We still have to reboot the cluster nodes once in a while to free up RAM). Furthermore we will upgrade to 12.2.12 soon Caspar Smit Systemengineer SuperNAS Dorsvlegelstraat 13 1445 PA Purmerend t: (+31) 299 410 414 e: caspars...@supernas.eu w: www.supernas.eu Op do 5 dec. 2019 om 07:26 schreef Konstantin Shalygin <k0...@k0ste.ru>: > I tried to dig in the mailinglist archives but couldn't find a clear answer > to the following situation: > > Ceph encountered a scrub error resulting in HEALTH_ERR > Two PG's are active+clean+inconsistent. When investigating the PG i see a > "read_error" on the primary OSD. Both PG's are replicated PG's with 3 > copies. > > I'm on Luminous 12.2.5 on this installation, is it safe to just run "ceph > pg repair" on those PG's or will it then overwrite the two good copies with > the bad one from the primary? > If the latter is true, what is the correct way to resolve this? > > Yes, you should call pg repair. Also It's better to upgrade to 12.2.12. > > > > k >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com