Hi Holger, In addition to Eugen's sound advice, I would try restarting the OSD in question.
If that doesn't help, I would stop the current deletion process 'ceph orch osd rm stop 406' and restart it 'ceph orch osd rm 406 --force'. As for the orchestrator logs, you can use the command 'ceph log last 1000 debug cephadm' and also check for any errors in ceph log files on host 'acn07'. Regards, Frédéric. ----- Le 28 Juin 25, à 19:53, Eugen Block ebl...@nde.ag a écrit : > Can you show the overall cluster status (ceph -s)? If there's > something else going on, it might block (some?) operations. And I'd > scan the mgr logs, maybe in debug mode to see why it fails to operate > properly. > > Zitat von Holger Naundorf <naund...@rz.uni-kiel.de>: > >> On 27.06.25 14:16, Eugen Block wrote: >>> >>> >>> Zitat von Holger Naundorf <naund...@rz.uni-kiel.de>: >>> >>>> Hello, >>>> title should of course be >>>> "orchestrator behaving strangely" >>>> >>>> I did give a mgr restart another try (for the last OSD removal - >>>> which also did not work - I did already restart the mgr without >>>> effect) >>>> >>>> there is no (immediate- i.e after ~10min) effect now as well - or >>>> should I reissue the OSD rm command as well? >>> >>> Is there something in the queue (ceph orch osd rm status)? >>> Sometimes the queue clears after a mgr restart, so it might be >>> necessary to restart the rm command as well. >>> >> There is just the one 'waiting for purge' osd in the queue: >> >> root@aadm01:~# ceph orch osd rm status >> OSD HOST STATE PGS REPLACE FORCE ZAP >> DRAIN STARTED AT >> 406 acn07 done, waiting for purge 0 True False True >> 2025-06-25 09:18:07.650734+00:00 >> >> One more datapoint: >> >> This is an OSD on a lage, rotational disk. The orchestrator is still >> working ok for a subet of OSDs on SSDs. We are just moving our SSD >> pool around and for this it was no problem using 'ceph orch osd rm >> ...' - with the difference that there we used --zap and not >> --replace (as we do not want to replace the disk, but to move the >> OSD away from this host). >> >> Regards, >> Holger >> >> >> >> >>>> Regards, >>>> Holger >>>> >>>> >>>> On 27.06.25 12:26, Eugen Block wrote: >>>>> Hi, >>>>> >>>>> have you retried it after restarting/failing the mgr? >>>>> >>>>> ceph mgr fail >>>>> >>>>> Quite often this (still) helps. >>>>> >>>>> Zitat von Holger Naundorf <naund...@rz.uni-kiel.de>: >>>>> >>>>>> Hello, >>>>>> we are running a ceph cluster at version: >>>>>> >>>>>> ceph version 19.2.2 (0eceb0defba60152a8182f7bd87d164b639885b8) >>>>>> squid (stable) >>>>>> >>>>>> and since a few weeks the orchestrator started to misbehave - up >>>>>> to now we could not identify any root cause, so I am fishing in >>>>>> the community to see if there are any hints. >>>>>> >>>>>> Problems: >>>>>> >>>>>> An OSD removal (for disk replacement) gets stuck in the 'purge' step: >>>>>> >>>>>> ceph orch osd rm 406 --replace >>>>>> >>>>>> root@aadm01:~# ceph orch osd rm status >>>>>> OSD HOST STATE PGS REPLACE FORCE ZAP >>>>>> DRAIN STARTED AT >>>>>> 406 acn07 done, waiting for purge 0 True False True >>>>>> 2025-06-25 09:18:07.650734+00:00 >>>>>> >>>>>> (now for more than 24h in this state) >>>>>> >>>>>> At the same time the orchestrator is not restarting OSD daemons >>>>>> - i.e. an 'ceph orch daemon restart osd.xxx' claims its queuing >>>>>> uo the restart, but it never happens. Other services continue to >>>>>> be controlled correctly via 'ceph orch ...' >>>>>> >>>>>> If anyone has an idea where to poke around or can match this to >>>>>> some known problem - I would appreciate any pointers. >>>>>> >>>>>> >>>>>> Regards, >>>>>> Holger >>>>>> >>>>>> -- >>>>>> Dr. Holger Naundorf >>>>>> Christian-Albrechts-Universität zu Kiel >>>>>> Rechenzentrum / HPC / Server und Storage >>>>>> Tel: +49 431 880-1990 >>>>>> Fax: +49 431 880-1523 >>>>>> naund...@rz.uni-kiel.de >>>>> >>>>> >>>>> _______________________________________________ >>>>> ceph-users mailing list -- ceph-users@ceph.io >>>>> To unsubscribe send an email to ceph-users-le...@ceph.io >>>> >>>> -- >>>> Dr. Holger Naundorf >>>> Christian-Albrechts-Universität zu Kiel >>>> Rechenzentrum / HPC / Server und Storage >>>> Tel: +49 431 880-1990 >>>> Fax: +49 431 880-1523 >>>> naund...@rz.uni-kiel.de >>> >>> >>> _______________________________________________ >>> ceph-users mailing list -- ceph-users@ceph.io >>> To unsubscribe send an email to ceph-users-le...@ceph.io >> >> -- >> Dr. Holger Naundorf >> Christian-Albrechts-Universität zu Kiel >> Rechenzentrum / HPC / Server und Storage >> Tel: +49 431 880-1990 >> Fax: +49 431 880-1523 >> naund...@rz.uni-kiel.de > > > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io