[ceph-users] Re: inactive PGs looking for a non existent OSD

Eugen Block Thu, 27 Jul 2023 03:13:43 -0700

Hi,

what exactly is your question? You seem to have made progress inbringing OSDs back up and reducing inactive PGs. What is unexpected tome is that one host failure would cause inactive PGs. Can you sharemore details about your osd tree and crush rules of the affectedinactive PGs? Usually, a ceph cluster should be resilient to a hostfailure if set up properly. So after your host failed I would haveexpected that ceph recovers the degraded PGs to different hosts. Didthe recovery not happen?


Regards,
Eugen

Zitat von Alfredo Rezinovsky <[email protected]>:

I had a problem with a server, hardware completely broken.

"ceph orch rm host"  hanged, even with force and offline options

I reinstalled other server with the same IP address and then I removed the
OSD with:

ceph osd purge osd.10
ceph osd purge osd.11

Now I have 0.342% pgs not active

with

ceph pg <pg.id> query

I can see the PG is blocked by a non existent OSD.10 or 11 (in the other
problematic PG)

I already tried setting

osd_find_best_info_ignore_history_les = false

in the intervening OSDs and restarted them with some luck (I had 3 non
active PGs, now I have 2)

Also after that another OSD keeps restarting. Fixed that by setting the
reweight to 0 and still waiting until the OSD is empty to destroy it.


--
Alfrenovsky
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]



_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: inactive PGs looking for a non existent OSD

Reply via email to