[ceph-users] Re: 18.2.2: Upgrade not starting (ceph orch upgrade)

Michel Jouvin Fri, 25 Apr 2025 01:58:06 -0700

Hi,

I tried to restart all the mgrs (we have 3, 1 active, 2 standby) byexecuting 3 times the `ceph mgr fail`, no impact. I don't reallyunderstand why I get these stray daemons after doing a 'ceph orch osd rm--replace` but I think I have always seen this. I tried to mute ratherthan disable the stray daemon check but it doesn't help either. And Ifind strange this message every 10s about one of the destroyed OSD andonly one, reporting it is down and already destroyed and saying it'llzap it (I think I didn't add --zap when I removed it as the underlyingdisk is dead).

I'm completely stuck with this upgrade and I don't remember having thiskind of problems in previous upgrades with cephadm... Any idea where tolook for the cause and/or how to fix it?


Best regards,

Michel

Le 24/04/2025 à 23:34, Michel Jouvin a écrit :

Hi,
I'm trying to upgrade a (cephadm) cluster from 18.2.2 to 18.2.6, using'ceph orch upgrade'. When I enter the command 'ceph orch upgrade start--ceph-version 18.2.6', I receive a message saying that the upgradehas been initiated, with a similar message in the logs but nothinghappens after this. 'ceph orch upgrade status' says:
-------

[root@ijc-mon1 ~]# ceph orch upgrade status
{
    "target_image": "quay.io/ceph/ceph:v18.2.6",
    "in_progress": true,
    "which": "Upgrading all daemon types on all hosts",
    "services_complete": [],
    "progress": "",
    "message": "",
    "is_paused": false
}
-------
The first time I entered the command, the cluster status wasHEALTH_WARN because of 2 stray daemons (caused by destroyed OSDs, rm--replace). I set mgr/cephadm/warn_on_stray_daemons to false to ignorethese 2 daemons, the cluster is now HEALTH_OK but it doesn't help.Following a Red Hat KB entry, I tried to failover the mgr, stopped anrestarted the upgrade but without any improvement. I have not seenanything in the logs, except that there is an INF entry every 10sabout the destroyed OSD saying:
------
2025-04-24T21:30:54.161988+0000 mgr.ijc-mon1.yyfnhz (mgr.55376028)14079 : cephadm [INF] osd.253 now down2025-04-24T21:30:54.162601+0000 mgr.ijc-mon1.yyfnhz (mgr.55376028)14080 : cephadm [INF] Daemon osd.253 on dig-osd4 was already removed2025-04-24T21:30:54.164440+0000 mgr.ijc-mon1.yyfnhz (mgr.55376028)14081 : cephadm [INF] Successfully destroyed old osd.253 on dig-osd4;ready for replacement2025-04-24T21:30:54.164536+0000 mgr.ijc-mon1.yyfnhz (mgr.55376028)14082 : cephadm [INF] Zapping devices for osd.253 on dig-osd4
-----
The message seems to be only for one of the 2 destroyed OSDs since Irestarted the mgr. May this be the cause for the stucked upgrade? Whatcan I do for fixing this?
Thanks in advance for any hint. Best regards,

Michel

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: 18.2.2: Upgrade not starting (ceph orch upgrade)

Reply via email to