[ceph-users] Re: 18.2.2: Upgrade not starting (ceph orch upgrade)

Eugen Block Wed, 30 Apr 2025 07:14:12 -0700

Sounds great that the upgrade went through!

May be it should be better documented that you should not zap adevice intended for definitive removal if you don't haveosd.all-available-devices service placement was set to unmanaged...

I'd rather vote for not considering osd.all-available-devices assomething for production. ;-) I only use that in small test clustersto quickly set up OSDs without having to deal with specs. But inproduction clusters, I like to have full control over OSD creation.Just my opinion. :-)


Zitat von Michel Jouvin <michel.jou...@ijclab.in2p3.fr>:

Hi,
Thanks for all the feedback and suggestions. Summary of the summary:after stopping the removal for the OSD waiting to be zapped (becauseof the no longer available disk), the upgrade started immediatelyand ran well. The cluster is now running 18.2.6! And as saidpreviously by Eugen, I confirm that in 18.2.6, removed OSDs are nolonger considered stray daemons. I still have the feeling that Cephcould give more useful information if:
- a cephadm message at INFO level (and visible with 'ceph orchupgrade status' would report that the upgrade cannot proceed becauseof described reason. This information could be given once, a fewminutes after entering the upgrade command is no daemon has beenupgraded yet, for example.
- a message at INFO level was informing that the zap operationfailed (suggesting to use DEBUG level for more information)
About Anthony's last question, yes the 2 OSDs were destroyed as showed by:

# ceph osd tree|grep destroyed
253    hdd    16.37108                  osd.253 destroyed         0  1.00000
381    hdd    16.37108                  osd.381 destroyed         0  1.00000
@Eugen regarding what I said about osd.381 being picked up by Cephto replace the failed osd.381 OSD, I think it is the conjunction ofthe fact that osd.all-available-devices service placement was notset to unmanaged (something we tend to do normally but as we add afew servers recently we changed it and forgot to set it back tounmanaged) and that in the initial removal I zapped the device.Because of this, the device appeared to be free for use... May be itshould be better documented that you should not zap a deviceintended for definitive removal if you don't haveosd.all-available-devices service placement was set to unmanaged...
Thanks again. Best regards,

Michel

Le 30/04/2025 à 15:41, Eugen Block a écrit :
Hm, I thought there was an excerpt from the osd tree, butapparently not? Could you then please confirm that the OSDs are infact marked as destroyed in the osd tree?
Zitat von Anthony D'Atri <anthony.da...@gmail.com>:
I'm not entirely sure what the orchestrator will do except forclearing the pending state, and since the OSDs are already markedas destroyed in the crush tree,
Do we know that they are? The thread shows some log messages, butnot unless I’m missing it evidence that they were marked. When Iran into a similar issue recently, they were not marked destroyedin the CRUSH tree.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: 18.2.2: Upgrade not starting (ceph orch upgrade)

Reply via email to