[ceph-users] Re: Production cluster in bad shape after several OSD crashes

Michel Jouvin Thu, 27 Mar 2025 05:31:28 -0700

Frédéric,

When I was writing the last email, my colleague launched a re-peering ofthe PG in activating state: the PG became active immediately buttriggered a little bit of rebalancing of other PGs, not necessarily inthe same pool. After this success, we decided to go for your approach,selected a not too critical pool and did a repeer command on all thepool PGs. This resulted in a huge rebalancing (5M objects, in progress,affecting many pools), basically a rebalancing similar (in size) to theunexpected one we have seen after the incident 2 days ago. Could it meanthat the state of some OSD was improperly set/used after the restart ofOSD servers after the incident and may have resulted in an inappropriateplacement of PGs that is being currently fixed after the repeer commandcauses a reevaluation of the crush map ?


Cheers,

Michel

Le 27/03/2025 à 12:16, Michel Jouvin a écrit :

Frédéric,
Thanks for your answer. I checked the number of PG on osd.17: it is164, very far from the hard limit (750, the default I think). So itdoesn't seem to be the problem and may be the peering is a victim ofthe more general problem leading to many pools to be more or lessinaccessible. What inaccessible means here is not entirely clear:
- We tested the ability to access the pool content with 'rados ls' asI said and we considered that a pool was inaccessible when the commandwas timing out after 10s (no explicit error). This happens also onempty pools.
- At the same time, on one such pool at least, we were able tosuccessfully upload and download a large file with a S3 client (thispool is part of the data pool of a Swift RGW).
To be honest we have not checked all the logs yet! We concentratedmainly on the mon logs but we'll have a look to some OSD logs.
As for restarting daemons, I am not so reluctant to do it. I have thefeeling that in the absence of any message related to inconsistencies,there is no real risk if we restart them one by one and check withok-to-stop before doing it. What's your feeling? Is it worthrestarting the 3 mon first (one by one)?
You mention as an alternative re-peering all PGs of one pool.I was notaware we could do it but I see that there is a 'ceph pg repeer'command. Anything else we should do before running the command? Doesit make sense to try it on the PG stucked in activating+remapped state?
Best regards,

Michel

Le 27/03/2025 à 11:40, Frédéric Nass a écrit :
echo "`ceph config get osd.0 mon_max_pg_per_osd`*`ceph config getosd.0 osd_max_pg_per_osd_hard_ratio`" | bc

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Production cluster in bad shape after several OSD crashes

Reply via email to