[ceph-users] Cephadm staggered upgrade --limit parameter not always applied

Eugen Block via ceph-users Thu, 05 Mar 2026 14:15:36 -0800

Hi,

I did a quick search on tracker but couldn't find anything related. Acustomer reported this, and I can confirm the behaviour on a labcluster. I usually perform a staggered upgrade with --daemon-types and--limit, but not for all daemon types. So I haven't stumbled acrossthis yet myself, but our customer did (I can reproduce with--daemon-types as well). They upgraded from latest Reef to latestSquid and reported that despite providing the --limit parameter, themons were all upgraded. So I tried to reproduce, and the behaviour isnot really clear to me, I'll try to clarify.


# Start with MGRs

reef1:~ # ceph orch upgrade start --image quay.io/ceph/ceph:v19.2.3--services mgr --limit 1

Upgrading MGRs with limit works, but it doesn't reflect in the MGRlog. Usually, I expect a line like this:


...[cephadm INFO root] Hit upgrade limit of 1. Stopping upgrade

But there is no such line in the logs. Then I upgrade the rest of the MGRs.

# Continue with MONs

reef1:~ # ceph orch upgrade start --image quay.io/ceph/ceph:v19.2.3--services mon --limit 1

This gets even weirder, the orchestrator upgrades 2 out of 3 MONs. Andagain, no such line in the log (Hit upgrade limit). What I noticed wasa MGR respawn after the first MON had been upgraded successfully.Maybe some state of the upgrade progress gets lost during the respawn?I then upgraded the remaining MON. Then ceph-crash is upgradedsuccessfully.


# Upgrade OSD

reef1:~ # ceph orch upgrade start --image quay.io/ceph/ceph:v19.2.3--services osd.osd.standalone --limit 1


And this is the first service that actually reports the upgrade limit:

2026-03-05T15:47:17.108+0000 7f5bfdc73640 0 [cephadm INFOcephadm.upgrade] Upgrade: Updating osd.0 (1/1)2026-03-05T15:47:17.108+0000 7f5bfdc73640 0 log_channel(cephadm) log[INF] : Upgrade: Updating osd.0 (1/1)2026-03-05T15:47:32.518+0000 7f5bfdc73640 0 [cephadm INFO root] Hitupgrade limit of 1. Stopping upgrade2026-03-05T15:47:32.518+0000 7f5bfdc73640 0 log_channel(cephadm) log[INF] : Hit upgrade limit of 1. Stopping upgrade2026-03-05T15:47:47.395+0000 7f5bfdc73640 0 [cephadm INFOcephadm.upgrade] Upgrade: Setting container_image for all nvmeof2026-03-05T15:47:47.395+0000 7f5bfdc73640 0 log_channel(cephadm) log[INF] : Upgrade: Setting container_image for all nvmeof2026-03-05T15:47:47.492+0000 7f5bfdc73640 0 [cephadm INFOcephadm.upgrade] Upgrade: Finalizing container_image settings2026-03-05T15:47:47.493+0000 7f5bfdc73640 0 log_channel(cephadm) log[INF] : Upgrade: Finalizing container_image settings2026-03-05T15:47:47.667+0000 7f5bfdc73640 0 [cephadm INFOcephadm.upgrade] Upgrade: Complete!2026-03-05T15:47:47.667+0000 7f5bfdc73640 0 log_channel(cephadm) log[INF] : Upgrade: Complete!

This is really irritating and inconsistent: If the orchestrator doeshonor --limit with other services than OSD, why isn't that visible inthe logs? And what's with the MONs? Why 2 out of 3?

Any pointers appreciated! Not sure which Ceph versions might beaffected by this, I'll try out a couple more upgrade paths.


Thanks,
Eugen
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Cephadm staggered upgrade --limit parameter not always applied

Reply via email to