Hi,

what exactly is your question? You seem to have made progress in bringing OSDs back up and reducing inactive PGs. What is unexpected to me is that one host failure would cause inactive PGs. Can you share more details about your osd tree and crush rules of the affected inactive PGs? Usually, a ceph cluster should be resilient to a host failure if set up properly. So after your host failed I would have expected that ceph recovers the degraded PGs to different hosts. Did the recovery not happen?

Regards,
Eugen

Zitat von Alfredo Rezinovsky <[email protected]>:

I had a problem with a server, hardware completely broken.

"ceph orch rm host"  hanged, even with force and offline options

I reinstalled other server with the same IP address and then I removed the
OSD with:

ceph osd purge osd.10
ceph osd purge osd.11

Now I have 0.342% pgs not active

with

ceph pg <pg.id> query

I can see the PG is blocked by a non existent OSD.10 or 11 (in the other
problematic PG)

I already tried setting

osd_find_best_info_ignore_history_les = false

in the intervening OSDs and restarted them with some luck (I had 3 non
active PGs, now I have 2)

Also after that another OSD keeps restarting. Fixed that by setting the
reweight to 0 and still waiting until the OSD is empty to destroy it.


--
Alfrenovsky
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]


_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to