[ceph-users] Re: Cephadm upgrade from 16.2.15 -> 17.2.0

2025-04-22 Thread Eugen Block
Awesome, glad to hear it worked! Regarding your question if you should upgrade further, it's not a simple "yes" or "no" question. Do you need features or bug fixes from Squid that are missing in Reef? Reef is still supported but it was just announced yesterday that it will be EOL in August:

[ceph-users] Re: Cephadm upgrade from 16.2.15 -> 17.2.0

2025-04-21 Thread Jeremy Hansen
Just to follow this through, 18.2.6 fixed my issues and I was able to complete the upgrade. Is it advisable to go to 19 or should I stay on reef? -jeremy > On Monday, Apr 14, 2025 at 12:14 AM, Jeremy Hansen (mailto:jer...@skidrow.la)> wrote: > Thanks. I’ll wait. I need this to go smoothly on an

[ceph-users] Re: Cephadm upgrade from 16.2.15 -> 17.2.0

2025-04-14 Thread Eugen Block
Ah, this looks like the encryption issue which seems new in 18.2.5, brought up here: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/UJ4DREAWNBBVVUJXYVZO25AYVQ5RLT42/ In that case it's questionable if you really want to upgrade to 18.2.5. Maybe 18.2.4 would be more suitable,

[ceph-users] Re: Cephadm upgrade from 16.2.15 -> 17.2.0

2025-04-14 Thread Jeremy Hansen
Thanks. I’ll wait. I need this to go smoothly on another cluster that has to go through the same process. -jeremy > On Monday, Apr 14, 2025 at 12:10 AM, Eugen Block (mailto:ebl...@nde.ag)> wrote: > Ah, this looks like the encryption issue which seems new in 18.2.5, > brought up here: > > https:

[ceph-users] Re: Cephadm upgrade from 16.2.15 -> 17.2.0

2025-04-14 Thread Eugen Block
Are you using Rook? Usually, I see this warning when a host is not reachable, for example during a reboot. But it also clears when the host comes back. Do you see this permanently or from time to time? It might have to do with the different Ceph versions, I'm not sure. But it shouldn't be a

[ceph-users] Re: Cephadm upgrade from 16.2.15 -> 17.2.0

2025-04-14 Thread Jeremy Hansen
I haven’t attempted the remaining upgrade just yet. I wanted to check on this before proceeding. Things seem “stable” in the sense that I’m running VMs and all volumes and images are still functioning. I’m using whatever would have been the default from 16.2.14. It seems to be from time to time

[ceph-users] Re: Cephadm upgrade from 16.2.15 -> 17.2.0

2025-04-13 Thread Jeremy Hansen
This looks relevant. https://github.com/rook/rook/issues/13600#issuecomment-1905860331 > On Sunday, Apr 13, 2025 at 10:08 AM, Jeremy Hansen (mailto:jer...@skidrow.la)> wrote: > I’m now seeing this: > > cluster: > id: 95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1 > health: HEALTH_WARN > Failed to apply 1

[ceph-users] Re: Cephadm upgrade from 16.2.15 -> 17.2.0

2025-04-13 Thread Jeremy Hansen
I’m now seeing this: cluster: id: 95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1 health: HEALTH_WARN Failed to apply 1 service(s): osd.cost_capacity I’m assuming this is due to the fact that I’ve only upgraded mgr but I wanted to double check before proceeding with the rest of the components. Thanks -jer

[ceph-users] Re: Cephadm upgrade from 16.2.15 -> 17.2.0

2025-04-13 Thread Jeremy Hansen
Updating mgr’s to 18.2.5 seemed to work just fine. I will go for the remaining services after the weekend. Thanks. -jeremy > On Thursday, Apr 10, 2025 at 6:37 AM, Eugen Block (mailto:ebl...@nde.ag)> wrote: > Glad I could help! I'm also waiting for 18.2.5 to upgrade our own > cluster from Pacifi

[ceph-users] Re: Cephadm upgrade from 16.2.15 -> 17.2.0

2025-04-10 Thread Eugen Block
Glad I could help! I'm also waiting for 18.2.5 to upgrade our own cluster from Pacific after getting rid of our cache tier. :-D Zitat von Jeremy Hansen : This seems to have worked to get the orch back up and put me back to 16.2.15. Thank you. Debating on waiting for 18.2.5 to move forward.

[ceph-users] Re: Cephadm upgrade from 16.2.15 -> 17.2.0

2025-04-08 Thread Jeremy Hansen
This seems to have worked to get the orch back up and put me back to 16.2.15. Thank you. Debating on waiting for 18.2.5 to move forward. -jeremy > On Monday, Apr 07, 2025 at 1:26 AM, Eugen Block (mailto:ebl...@nde.ag)> wrote: > Still no, just edit the unit.run file for the MGRs to use a differe

[ceph-users] Re: Cephadm upgrade from 16.2.15 -> 17.2.0

2025-04-07 Thread Jeremy Hansen
Got it. Thank you. I forgot that you mentioned the run files. I’ll hold off a bit to see if there’s more comments but I feel like I at least have things to try. Thanks again. -jeremy > On Monday, Apr 07, 2025 at 1:26 AM, Eugen Block (mailto:ebl...@nde.ag)> wrote: > Still no, just edit the unit

[ceph-users] Re: Cephadm upgrade from 16.2.15 -> 17.2.0

2025-04-07 Thread Eugen Block
Still no, just edit the unit.run file for the MGRs to use a different image. See Frédéric's instructions (now that I'm re-reading it, there's a little mistake with dots and hyphens): # Backup the unit.run file $ cp /var/lib/ceph/$(ceph fsid)/mgr.ceph01.eydqvm/unit.run{,.bak} # Change contain

[ceph-users] Re: Cephadm upgrade from 16.2.15 -> 17.2.0

2025-04-07 Thread Jeremy Hansen
Thank you. The only thing I’m unclear on is the rollback to pacific. Are you referring to > > > https://docs.ceph.com/en/quincy/cephadm/troubleshooting/#manually-deploying-a-manager-daemon Thank you. I appreciate all the help. Should I wait for Adam to comment? At the moment, the cluster is fun

[ceph-users] Re: Cephadm upgrade from 16.2.15 -> 17.2.0

2025-04-07 Thread Eugen Block
I haven't tried it this way yet, and I had hoped that Adam would chime in, but my approach would be to remove this key (it's not present when no upgrade is in progress): ceph config-key rm mgr/cephadm/upgrade_state Then rollback the two newer MGRs to Pacific as described before. If they co

[ceph-users] Re: Cephadm upgrade from 16.2.15 -> 17.2.0

2025-04-06 Thread Jeremy Hansen
Snipped some of the irrelevant logs to keep message size down. ceph config-key get mgr/cephadm/upgrade_state {"target_name": "quay.io/ceph/ceph:v17.2.0", "progress_id": "e7e1a809-558d-43a7-842a-c6229fdc57af", "target_id": "e1d6a67b021eb077ee22bf650f1a9fb1980a2cf5c36bdb9cba9eac6de8f702d9", "tar

[ceph-users] Re: Cephadm upgrade from 16.2.15 -> 17.2.0

2025-04-05 Thread Eugen Block
The mgr logs should contain a stack trace, can you check again? Zitat von Jeremy Hansen : No. Same issue unfortunately. On Saturday, Apr 05, 2025 at 1:27 PM, Eugen Block (mailto:ebl...@nde.ag)> wrote: I don't see the module error in the logs you provided. Has it cleared? A mgr failover is of

[ceph-users] Re: Cephadm upgrade from 16.2.15 -> 17.2.0

2025-04-05 Thread Jeremy Hansen
[ceph: root@cn01 /]# ceph orch host ls Error ENOENT: Module not found -jeremy > On Saturday, Apr 05, 2025 at 1:27 PM, Eugen Block (mailto:ebl...@nde.ag)> wrote: > I don't see the module error in the logs you provided. Has it cleared? > A mgr failover is often helpful in such cases, basically it'

[ceph-users] Re: Cephadm upgrade from 16.2.15 -> 17.2.0

2025-04-05 Thread Jeremy Hansen
No. Same issue unfortunately. > On Saturday, Apr 05, 2025 at 1:27 PM, Eugen Block (mailto:ebl...@nde.ag)> wrote: > I don't see the module error in the logs you provided. Has it cleared? > A mgr failover is often helpful in such cases, basically it's the > first thing I suggest since two or three

[ceph-users] Re: Cephadm upgrade from 16.2.15 -> 17.2.0

2025-04-05 Thread Eugen Block
I don't see the module error in the logs you provided. Has it cleared? A mgr failover is often helpful in such cases, basically it's the first thing I suggest since two or three years. Zitat von Jeremy Hansen : Thank you so much for the detailed instructions. Here’s logs from the failover

[ceph-users] Re: Cephadm upgrade from 16.2.15 -> 17.2.0

2025-04-05 Thread Jeremy Hansen
Thank you so much for the detailed instructions. Here’s logs from the failover to a new node. Apr 05 20:06:08 cn02.ceph.xyz.corp ceph-95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1-mgr-cn02-ceph-xyz-corp-ggixgj[2357414]: :::192.168.47.72 - - [05/Apr/2025:20:06:08] "GET /metrics HTTP/1.1" 200 - "" "P

[ceph-users] Re: Cephadm upgrade from 16.2.15 -> 17.2.0

2025-04-05 Thread Eugen Block
Please don't drop the list from your response. The orchestrator might be down, but you should have an active mgr (ceph -s). On the host running the mgr, you can just run 'cephadm logs --name mgr.', or 'journalctl -u ceph-@mgr.'. To get fresh logs, you can just run 'ceph mgr fail', look on w

[ceph-users] Re: Cephadm upgrade from 16.2.15 -> 17.2.0

2025-04-05 Thread Eugen Block
It might be a different issue, can you paste the entire stack trace from the mgr when it's failing? Also, you could go directly from Pacific to Reef, there's no need to upgrade to Quincy (which is EOL). And I would also recommend to upgrade to a latest major version, e.g. 17.2.8, not .0, ot