[ceph-users] Re: orchestrator behaving strangely

Eugen Block Sat, 28 Jun 2025 10:53:57 -0700

Can you show the overall cluster status (ceph -s)? If there'ssomething else going on, it might block (some?) operations. And I'dscan the mgr logs, maybe in debug mode to see why it fails to operateproperly.


Zitat von Holger Naundorf <naund...@rz.uni-kiel.de>:

On 27.06.25 14:16, Eugen Block wrote:
Zitat von Holger Naundorf <naund...@rz.uni-kiel.de>:
Hello,
title should of course be
 "orchestrator behaving strangely"
I did give a mgr restart another try (for the last OSD removal -which also did not work - I did already restart the mgr withouteffect)
there is no (immediate- i.e after ~10min) effect now as well - orshould I reissue the OSD rm command as well?
Is there something in the queue (ceph orch osd rm status)?Sometimes the queue clears after a mgr restart, so it might benecessary to restart the rm command as well.
There is just the one 'waiting for purge' osd in the queue:

root@aadm01:~# ceph orch osd rm status
OSD HOST STATE PGS REPLACE FORCE ZAPDRAIN STARTED AT406 acn07 done, waiting for purge 0 True False True2025-06-25 09:18:07.650734+00:00
One more datapoint:
This is an OSD on a lage, rotational disk. The orchestrator is stillworking ok for a subet of OSDs on SSDs. We are just moving our SSDpool around and for this it was no problem using 'ceph orch osd rm...' - with the difference that there we used --zap and not--replace (as we do not want to replace the disk, but to move theOSD away from this host).
Regards,
Holger
Regards,
Holger


On 27.06.25 12:26, Eugen Block wrote:
Hi,

have you retried it after restarting/failing the mgr?

ceph mgr fail

Quite often this (still) helps.

Zitat von Holger Naundorf <naund...@rz.uni-kiel.de>:
Hello,
we are running a ceph cluster at version:
ceph version 19.2.2 (0eceb0defba60152a8182f7bd87d164b639885b8)squid (stable)
and since a few weeks the orchestrator started to misbehave - upto now we could not identify any root cause, so I am fishing inthe community to see if there are any hints.
Problems:

An OSD removal (for disk replacement) gets stuck in the 'purge' step:

ceph orch osd rm 406 --replace

root@aadm01:~# ceph orch osd rm status
OSD HOST STATE PGS REPLACE FORCE ZAPDRAIN STARTED AT406 acn07 done, waiting for purge 0 True False True2025-06-25 09:18:07.650734+00:00
(now for more than 24h in this state)
At the same time the orchestrator is not restarting OSD daemons- i.e. an 'ceph orch daemon restart osd.xxx' claims its queuinguo the restart, but it never happens. Other services continue tobe controlled correctly via 'ceph orch ...'
If anyone has an idea where to poke around or can match this tosome known problem - I would appreciate any pointers.
Regards,
Holger

--
Dr. Holger Naundorf
Christian-Albrechts-Universität zu Kiel
Rechenzentrum / HPC / Server und Storage
Tel: +49 431 880-1990
Fax:  +49 431 880-1523
naund...@rz.uni-kiel.de
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
--
Dr. Holger Naundorf
Christian-Albrechts-Universität zu Kiel
Rechenzentrum / HPC / Server und Storage
Tel: +49 431 880-1990
Fax:  +49 431 880-1523
naund...@rz.uni-kiel.de
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
--
Dr. Holger Naundorf
Christian-Albrechts-Universität zu Kiel
Rechenzentrum / HPC / Server und Storage
Tel: +49 431 880-1990
Fax:  +49 431 880-1523
naund...@rz.uni-kiel.de



_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: orchestrator behaving strangely

Reply via email to