Thanks. I’ll wait. I need this to go smoothly on another cluster that has to go through the same process.
-jeremy > On Monday, Apr 14, 2025 at 12:10 AM, Eugen Block <ebl...@nde.ag > (mailto:ebl...@nde.ag)> wrote: > Ah, this looks like the encryption issue which seems new in 18.2.5, > brought up here: > > https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/UJ4DREAWNBBVVUJXYVZO25AYVQ5RLT42/ > > In that case it's questionable if you really want to upgrade to > 18.2.5. Maybe 18.2.4 would be more suitable, although it's missing bug > fixes from .5 (like the RGW memory leak). If you really need to > upgrade, I guess I would go with .4, otherwise stay on Pacific until > this issue has been addressed. It's not an easy decision. ;-) > > Zitat von Jeremy Hansen <jer...@skidrow.la>: > > > I haven’t attempted the remaining upgrade just yet. I wanted to > > check on this before proceeding. Things seem “stable” in the sense > > that I’m running VMs and all volumes and images are still > > functioning. I’m using whatever would have been the default from > > 16.2.14. It seems to be from time to time because I receive nagios > > alerts, which seem to eventually clear and then reappear. > > > > HEALTH_WARN Failed to apply 1 service(s): osd.cost_capacity > > [WRN] CEPHADM_APPLY_SPEC_FAIL: Failed to apply 1 service(s): > > osd.cost_capacity > > osd.cost_capacity: cephadm exited with an error code: 1, > > stderr:Inferring config > > /var/lib/ceph/95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1/mon.cn02/config > > Non-zero exit code 1 from /usr/bin/podman run --rm --ipc=host > > --stop-signal=SIGTERM --net=host --entrypoint /usr/sbin/ceph-volume > > --privileged --group-add=disk --init -e > > CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:47de8754d1f72fadb61523247c897fdf673f9a9689503c64ca8384472d232c5c > > -e NODE_NAME=cn02.ceph.xyz.corp -e > > CEPH_VOLUME_OSDSPEC_AFFINITY=cost_capacity -e > > CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v > > /var/run/ceph/95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1:/var/run/ceph:z -v > > /var/log/ceph/95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1:/var/log/ceph:z -v > > /var/lib/ceph/95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1/crash:/var/lib/ceph/crash:z > > -v /run/systemd/journal:/run/systemd/journal -v /dev:/dev -v > > /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v > > /run/lock/lvm:/run/lock/lvm -v /:/rootfs -v /etc/hosts:/etc/hosts:ro -v > > /tmp/ceph-tmp49jj8zoh:/etc/ceph/ceph.conf:z -v > > /tmp/ceph-tmp_9k8v5uj:/var/lib/ceph/bootstrap-osd/ceph.keyring:z > > quay.io/ceph/ceph@sha256:47de8754d1f72fadb61523247c897fdf673f9a9689503c64ca8384472d232c5c > > lvm batch --no-auto /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf --dmcrypt > > --yes > > --no-systemd > > /usr/bin/podman: stderr Traceback (most recent call last): > > /usr/bin/podman: stderr File "/usr/sbin/ceph-volume", line 33, in <module> > > /usr/bin/podman: stderr > > sys.exit(load_entry_point('ceph-volume==1.0.0', 'console_scripts', > > 'ceph-volume')()) > > /usr/bin/podman: stderr File > > "/usr/lib/python3.9/site-packages/ceph_volume/main.py", line 54, in > > __init__ > > /usr/bin/podman: stderr self.main(self.argv) > > /usr/bin/podman: stderr File > > "/usr/lib/python3.9/site-packages/ceph_volume/decorators.py", line > > 59, in newfunc > > /usr/bin/podman: stderr return f(*a, **kw) > > /usr/bin/podman: stderr File > > "/usr/lib/python3.9/site-packages/ceph_volume/main.py", line 166, in > > main > > /usr/bin/podman: stderr terminal.dispatch(self.mapper, subcommand_args) > > /usr/bin/podman: stderr File > > "/usr/lib/python3.9/site-packages/ceph_volume/terminal.py", line > > 194, in dispatch > > /usr/bin/podman: stderr instance.main() > > /usr/bin/podman: stderr File > > "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/main.py", > > line 46, in main > > /usr/bin/podman: stderr terminal.dispatch(self.mapper, self.argv) > > /usr/bin/podman: stderr File > > "/usr/lib/python3.9/site-packages/ceph_volume/terminal.py", line > > 192, in dispatch > > /usr/bin/podman: stderr instance = mapper.get(arg)(argv[count:]) > > /usr/bin/podman: stderr File > > "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/batch.py", > > line 325, in __init__ > > /usr/bin/podman: stderr self.args = parser.parse_args(argv) > > /usr/bin/podman: stderr File "/usr/lib64/python3.9/argparse.py", > > line 1825, in parse_args > > /usr/bin/podman: stderr args, argv = self.parse_known_args(args, namespace) > > /usr/bin/podman: stderr File "/usr/lib64/python3.9/argparse.py", > > line 1858, in parse_known_args > > /usr/bin/podman: stderr namespace, args = > > self._parse_known_args(args, namespace) > > /usr/bin/podman: stderr File "/usr/lib64/python3.9/argparse.py", > > line 2067, in _parse_known_args > > /usr/bin/podman: stderr start_index = consume_optional(start_index) > > /usr/bin/podman: stderr File "/usr/lib64/python3.9/argparse.py", > > line 2007, in consume_optional > > /usr/bin/podman: stderr take_action(action, args, option_string) > > /usr/bin/podman: stderr File "/usr/lib64/python3.9/argparse.py", > > line 1935, in take_action > > /usr/bin/podman: stderr action(self, namespace, argument_values, > > option_string) > > /usr/bin/podman: stderr File > > "/usr/lib/python3.9/site-packages/ceph_volume/util/arg_validators.py", line > > 17, in __call__ > > /usr/bin/podman: stderr set_dmcrypt_no_workqueue() > > /usr/bin/podman: stderr File > > "/usr/lib/python3.9/site-packages/ceph_volume/util/encryption.py", > > line 54, in set_dmcrypt_no_workqueue > > /usr/bin/podman: stderr raise RuntimeError('Error while checking > > cryptsetup version.\n', > > /usr/bin/podman: stderr RuntimeError: ('Error while checking > > cryptsetup version.\n', '`cryptsetup --version` output:\n', > > 'cryptsetup 2.7.2 flags: UDEV BLKID KEYRING FIPS KERNEL_CAPI > > PWQUALITY ') > > Traceback (most recent call last): > > File "/usr/lib64/python3.9/runpy.py", line 197, in _run_module_as_main > > return _run_code(code, main_globals, None, > > File "/usr/lib64/python3.9/runpy.py", line 87, in _run_code > > exec(code, run_globals) > > File "/tmp/tmpedb1_faj.cephadm.build/__main__.py", line 11009, in <module> > > File "/tmp/tmpedb1_faj.cephadm.build/__main__.py", line 10997, in main > > File "/tmp/tmpedb1_faj.cephadm.build/__main__.py", line 2593, in > > _infer_config > > File "/tmp/tmpedb1_faj.cephadm.build/__main__.py", line 2509, in _infer_fsid > > File "/tmp/tmpedb1_faj.cephadm.build/__main__.py", line 2621, in > > _infer_image > > File "/tmp/tmpedb1_faj.cephadm.build/__main__.py", line 2496, in > > _validate_fsid > > File "/tmp/tmpedb1_faj.cephadm.build/__main__.py", line 7226, in > > command_ceph_volume > > File "/tmp/tmpedb1_faj.cephadm.build/__main__.py", line 2284, in call_throws > > RuntimeError: Failed command: /usr/bin/podman run --rm --ipc=host > > --stop-signal=SIGTERM --net=host --entrypoint /usr/sbin/ceph-volume > > --privileged --group-add=disk --init -e > > CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:47de8754d1f72fadb61523247c897fdf673f9a9689503c64ca8384472d232c5c > > -e NODE_NAME=cn02.ceph.xyz.corp -e > > CEPH_VOLUME_OSDSPEC_AFFINITY=cost_capacity -e > > CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v > > /var/run/ceph/95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1:/var/run/ceph:z -v > > /var/log/ceph/95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1:/var/log/ceph:z -v > > /var/lib/ceph/95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1/crash:/var/lib/ceph/crash:z > > -v /run/systemd/journal:/run/systemd/journal -v /dev:/dev -v > > /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v > > /run/lock/lvm:/run/lock/lvm -v /:/rootfs -v /etc/hosts:/etc/hosts:ro -v > > /tmp/ceph-tmp49jj8zoh:/etc/ceph/ceph.conf:z -v > > /tmp/ceph-tmp_9k8v5uj:/var/lib/ceph/bootstrap-osd/ceph.keyring:z > > quay.io/ceph/ceph@sha256:47de8754d1f72fadb61523247c897fdf673f9a9689503c64ca8384472d232c5c > > lvm batch --no-auto /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf --dmcrypt > > --yes > > --no-systemd > > > > — > > > > ceph orch ls osd --export > > service_type: osd > > service_id: all-available-devices > > service_name: osd.all-available-devices > > placement: > > host_pattern: '*' > > spec: > > data_devices: > > all: true > > filter_logic: AND > > objectstore: bluestore > > --- > > service_type: osd > > service_id: cost_capacity > > service_name: osd.cost_capacity > > placement: > > host_pattern: '*' > > spec: > > data_devices: > > rotational: 1 > > encrypted: true > > filter_logic: AND > > objectstore: bluestore > > > > Thank you > > -jeremy > > > > > On Sunday, Apr 13, 2025 at 11:48 PM, Eugen Block <ebl...@nde.ag > > > (mailto:ebl...@nde.ag)> wrote: > > > Are you using Rook? Usually, I see this warning when a host is not > > > reachable, for example during a reboot. But it also clears when the > > > host comes back. Do you see this permanently or from time to time? It > > > might have to do with the different Ceph versions, I'm not sure. But > > > it shouldn't be a show stopper for the remaining upgrade. Or are you > > > trying to deploy OSDs but it fails? You can paste > > > > > > ceph health detail > > > ceph orch ls osd --export > > > > > > You can also scan the cephadm.log for any hints. > > > > > > > > > Zitat von Jeremy Hansen <jer...@skidrow.la>: > > > > > > > This looks relevant. > > > > > > > > https://github.com/rook/rook/issues/13600#issuecomment-1905860331 > > > > > > > > > On Sunday, Apr 13, 2025 at 10:08 AM, Jeremy Hansen > > > > > <jer...@skidrow.la (mailto:jer...@skidrow.la)> wrote: > > > > > I’m now seeing this: > > > > > > > > > > cluster: > > > > > id: 95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1 > > > > > health: HEALTH_WARN > > > > > Failed to apply 1 service(s): osd.cost_capacity > > > > > > > > > > > > > > > I’m assuming this is due to the fact that I’ve only upgraded mgr > > > > > but I wanted to double check before proceeding with the rest of the > > > > > components. > > > > > > > > > > Thanks > > > > > -jeremy > > > > > > > > > > > > > > > > > > > > > > > > > > On Sunday, Apr 13, 2025 at 12:59 AM, Jeremy Hansen > > > > > <jer...@skidrow.la (mailto:jer...@skidrow.la)> wrote: > > > > > > Updating mgr’s to 18.2.5 seemed to work just fine. I will go for > > > > > the remaining services after the weekend. Thanks. > > > > > > > > > > > > -jeremy > > > > > > > > > > > > > > > > > > > > > > > > > On Thursday, Apr 10, 2025 at 6:37 AM, Eugen Block > > > > > <ebl...@nde.ag (mailto:ebl...@nde.ag)> wrote: > > > > > > > Glad I could help! I'm also waiting for 18.2.5 to upgrade our own > > > > > > > cluster from Pacific after getting rid of our cache tier. :-D > > > > > > > > > > > > > > Zitat von Jeremy Hansen <jer...@skidrow.la>: > > > > > > > > > > > > > > > This seems to have worked to get the orch back up and put > > > me back to > > > > > > > > 16.2.15. Thank you. Debating on waiting for 18.2.5 to > > > move forward. > > > > > > > > > > > > > > > > -jeremy > > > > > > > > > > > > > > > > > On Monday, Apr 07, 2025 at 1:26 AM, Eugen Block <ebl...@nde.ag > > > > > > > > > (mailto:ebl...@nde.ag)> wrote: > > > > > > > > > Still no, just edit the unit.run file for the MGRs to use a > > > > > different > > > > > > > > > image. See Frédéric's instructions (now that I'm re-reading > > > > > > > > > it, > > > > > > > > > there's a little mistake with dots and hyphens): > > > > > > > > > > > > > > > > > > # Backup the unit.run file > > > > > > > > > $ cp /var/lib/ceph/$(ceph > > > fsid)/mgr.ceph01.eydqvm/unit.run{,.bak} > > > > > > > > > > > > > > > > > > # Change container image's signature. You can get the > > > > > signature of the > > > > > > > > > version you > > > > > > > > > want to reach from > > > > > https://quay.io/repository/ceph/ceph?tab=tags. It's > > > > > > > > > in the URL of a > > > > > > > > > version. > > > > > > > > > $ sed -i > > > > > > > > > > > > > > > > > 's/ceph@sha256:e40c19cd70e047d14d70f5ec3cf501da081395a670cd59ca881ff56119660c8f/ceph@sha256:d26c11e20773704382946e34f0d3d2c0b8bb0b7b37d9017faa9dc11a0196c7d9/g' > > > > > > > > > /var/lib/ceph/$(ceph fsid)/mgr.ceph01.eydqvm/unit.run > > > > > > > > > > > > > > > > > > # Restart the container (systemctl daemon-reload not needed) > > > > > > > > > $ systemctl restart ceph-$(ceph > > > fsid)(a)mgr.ceph01.eydqvm.service > > > > > > > > > > > > > > > > > > # Run this command a few times and it should show the > > > new version > > > > > > > > > ceph orch ps --refresh --hostname ceph01 | grep mgr > > > > > > > > > > > > > > > > > > To get the image signature, you can also look into the > > > > > other unit.run > > > > > > > > > files, a version tag would also work. > > > > > > > > > > > > > > > > > > It depends on how often you need the orchestrator to > > > maintain the > > > > > > > > > cluster. If you have the time, you could wait a bit > > > longer for other > > > > > > > > > responses. If you need the orchestrator in the meantime, > > > > > you can roll > > > > > > > > > back the MGRs. > > > > > > > > > > > > > > > > > > > > > > > > > > https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/32APKOXKRAIZ7IDCNI25KVYFCCCF6RJG/ > > > > > > > > > > > > > > > > > > Zitat von Jeremy Hansen <jer...@skidrow.la>: > > > > > > > > > > > > > > > > > > > Thank you. The only thing I’m unclear on is the rollback > > > > > to pacific. > > > > > > > > > > > > > > > > > > > > Are you referring to > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://docs.ceph.com/en/quincy/cephadm/troubleshooting/#manually-deploying-a-manager-daemon > > > > > > > > > > > > > > > > > > > > Thank you. I appreciate all the help. Should I wait > > > for Adam to > > > > > > > > > > comment? At the moment, the cluster is functioning enough to > > > > > > > > > > maintain running vms, so if it’s wise to wait, I can do > > > > > > > > > > that. > > > > > > > > > > > > > > > > > > > > -jeremy > > > > > > > > > > > > > > > > > > > > > On Monday, Apr 07, 2025 at 12:23 AM, Eugen Block > > > <ebl...@nde.ag > > > > > > > > > > > (mailto:ebl...@nde.ag)> wrote: > > > > > > > > > > > I haven't tried it this way yet, and I had hoped that > > > > > Adam would chime > > > > > > > > > > > in, but my approach would be to remove this key (it's > > > > > not present when > > > > > > > > > > > no upgrade is in progress): > > > > > > > > > > > > > > > > > > > > > > ceph config-key rm mgr/cephadm/upgrade_state > > > > > > > > > > > > > > > > > > > > > > Then rollback the two newer MGRs to Pacific as > > > > > described before. If > > > > > > > > > > > they come up healthy, test if the orchestrator works > > > > > properly first. > > > > > > > > > > > For example, remove a node-exporter or crash or > > > anything else > > > > > > > > > > > uncritical and let it redeploy. > > > > > > > > > > > If that works, try a staggered upgrade, starting with > > > > > the MGRs only: > > > > > > > > > > > > > > > > > > > > > > ceph orch upgrade start --image <image-name> > > > --daemon-types mgr > > > > > > > > > > > > > > > > > > > > > > Since there's no need to go to Quincy, I suggest to > > > > > upgrade to Reef > > > > > > > > > > > 18.2.4 (or you wait until 18.2.5 is released, which > > > > > should be very > > > > > > > > > > > soon), so set the respective <image-name> in the > > > above command. > > > > > > > > > > > > > > > > > > > > > > If all three MGRs successfully upgrade, you can > > > > > continue with the > > > > > > > > > > > MONs, or with the entire rest. > > > > > > > > > > > > > > > > > > > > > > In production clusters, I usually do staggered > > > > > upgrades, e. g. I limit > > > > > > > > > > > the number of OSD daemons first just to see if they > > > > > come up healthy, > > > > > > > > > > > then I let it upgrade all other OSDs automatically. > > > > > > > > > > > > > > > > > > > > > > > > > > > https://docs.ceph.com/en/latest/cephadm/upgrade/#staggered-upgrade > > > > > > > > > > > > > > > > > > > > > > Zitat von Jeremy Hansen <jer...@skidrow.la>: > > > > > > > > > > > > > > > > > > > > > > > Snipped some of the irrelevant logs to keep > > > message size down. > > > > > > > > > > > > > > > > > > > > > > > > ceph config-key get mgr/cephadm/upgrade_state > > > > > > > > > > > > > > > > > > > > > > > > {"target_name": "quay.io/ceph/ceph:v17.2.0", > > > "progress_id": > > > > > > > > > > > > "e7e1a809-558d-43a7-842a-c6229fdc57af", "target_id": > > > > > > > > > > > > > > > > > "e1d6a67b021eb077ee22bf650f1a9fb1980a2cf5c36bdb9cba9eac6de8f702d9", > > > > > > > > > > > > "target_digests": > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ["quay.io/ceph/ceph@sha256:12a0a4f43413fd97a14a3d47a3451b2d2df50020835bb93db666209f3f77617a", > > > > > > "quay.io/ceph/ceph@sha256:cb4d698cb769b6aba05bf6ef04f41a7fe694160140347576e13bd9348514b667"], > > > "target_version": "17.2.0", "fs_original_max_mds": null, > > > "fs_original_allow_standby_replay": null, "error": null, "paused": false, > > > "daemon_types": null, "hosts": null, "services": > > > null, > > > > > "total_count": > > > > > > > > > null, > > > > > > > > > > > "remaining_count": > > > > > > > > > > > > null} > > > > > > > > > > > > > > > > > > > > > > > > What should I do next? > > > > > > > > > > > > > > > > > > > > > > > > Thank you! > > > > > > > > > > > > -jeremy > > > > > > > > > > > > > > > > > > > > > > > > > On Sunday, Apr 06, 2025 at 1:38 AM, Eugen Block > > > > > <ebl...@nde.ag > > > > > > > > > > > > > (mailto:ebl...@nde.ag)> wrote: > > > > > > > > > > > > > Can you check if you have this config-key? > > > > > > > > > > > > > > > > > > > > > > > > > > ceph config-key get mgr/cephadm/upgrade_state > > > > > > > > > > > > > > > > > > > > > > > > > > If you reset the MGRs, it might be necessary to > > > > > clear this key, > > > > > > > > > > > > > otherwise you might end up in some inconsistency. > > > > > Just to be sure. > > > > > > > > > > > > > > > > > > > > > > > > > > Zitat von Jeremy Hansen <jer...@skidrow.la>: > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks. I’m trying to be extra careful since this > > > > > cluster is > > > > > > > > > > > > > > actually in use. I’ll wait for your feedback. > > > > > > > > > > > > > > > > > > > > > > > > > > > > -jeremy > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Saturday, Apr 05, 2025 at 3:39 PM, Eugen > > > > > Block <ebl...@nde.ag > > > > > > > > > > > > > > > (mailto:ebl...@nde.ag)> wrote: > > > > > > > > > > > > > > > No, that's not necessary, just edit the > > > > > unit.run file for > > > > > > > > > > > the MGRs to > > > > > > > > > > > > > > > use a different image. See Frédéric's > > > > > > > > > > > > > > > instructions: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/32APKOXKRAIZ7IDCNI25KVYFCCCF6RJG/ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > But I'm not entirely sure if you need to clear > > > > > > > > > > > > > > > some > > > > > > > > > > > config-keys first > > > > > > > > > > > > > > > in order to reset the upgrade state. If I have > > > > > time, I'll > > > > > > > > > > > try to check > > > > > > > > > > > > > > > tomorrow, or on Monday. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Zitat von Jeremy Hansen <jer...@skidrow.la>: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Would I follow this process to downgrade? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://docs.ceph.com/en/quincy/cephadm/troubleshooting/#manually-deploying-a-manager-daemon > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thank you > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Saturday, Apr 05, 2025 at 2:04 PM, > > > Jeremy Hansen > > > > > > > > > > > > > > > > > <jer...@skidrow.la > > > > > (mailto:jer...@skidrow.la)> wrote: > > > > > > > > > > > > > > > > > ceph -s claims things are healthy: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ceph -s > > > > > > > > > > > > > > > > > cluster: > > > > > > > > > > > > > > > > > id: 95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1 > > > > > > > > > > > > > > > > > health: HEALTH_OK > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > services: > > > > > > > > > > > > > > > > > mon: 3 daemons, quorum cn01,cn03,cn02 (age > > > > > > > > > > > > > > > > > 20h) > > > > > > > > > > > > > > > > > mgr: cn03.negzvb(active, since 26m), > > > > > standbys: cn01.tjmtph, > > > > > > > > > > > > > > > > > cn02.ceph.xyz.corp.ggixgj > > > > > > > > > > > > > > > > > mds: 1/1 daemons up, 2 standby > > > > > > > > > > > > > > > > > osd: 15 osds: 15 up (since 19h), 15 in > > > (since 14M) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > data: > > > > > > > > > > > > > > > > > volumes: 1/1 healthy > > > > > > > > > > > > > > > > > pools: 6 pools, 610 pgs > > > > > > > > > > > > > > > > > objects: 284.59k objects, 1.1 TiB > > > > > > > > > > > > > > > > > usage: 3.3 TiB used, 106 TiB / 109 TiB avail > > > > > > > > > > > > > > > > > pgs: 610 active+clean > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > io: > > > > > > > > > > > > > > > > > client: 255 B/s rd, 1.2 MiB/s wr, 10 op/s > > > > > rd, 16 op/s wr > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > — > > > > > > > > > > > > > > > > > How do I downgrade if the orch is down? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thank you > > > > > > > > > > > > > > > > > -jeremy > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Saturday, Apr 05, 2025 at 1:56 PM, > > > Eugen Block > > > > > > > > > > > <ebl...@nde.ag > > > > > > > > > > > > > > > > > (mailto:ebl...@nde.ag)> wrote: > > > > > > > > > > > > > > > > > > It would help if you only pasted the > > > > > relevant parts. > > > > > > > > > > > > > Anyway, these two > > > > > > > > > > > > > > > > > > sections stand out: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ---snip--- > > > > > > > > > > > > > > > > > > Apr 05 20:33:48 cn03.ceph.xyz.corp > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ceph-95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1-mgr-cn03-negzvb[307291]: > > > > > > > > > > > > > > > > > > debug 2025-04-05T20:33:48.909+0000 > > > 7f26f0200700 0 > > > > > > > > > > > > > [balancer INFO root] > > > > > > > > > > > > > > > > > > Some PGs (1.000000) are unknown; try > > > again later > > > > > > > > > > > > > > > > > > Apr 05 20:33:48 cn03.ceph.xyz.corp > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ceph-95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1-mgr-cn03-negzvb[307291]: > > > > > > > > > > > > > > > > > > debug 2025-04-05T20:33:48.917+0000 > > > > > 7f2663400700 -1 mgr > > > > > > > > > > > > > load Failed to > > > > > > > > > > > > > > > > > > construct class in 'cephadm' > > > > > > > > > > > > > > > > > > Apr 05 20:33:48 cn03.ceph.xyz.corp > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ceph-95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1-mgr-cn03-negzvb[307291]: > > > > > > > > > > > > > > > > > > debug 2025-04-05T20:33:48.917+0000 > > > > > 7f2663400700 -1 mgr > > > > > > > > > > > > > load Traceback > > > > > > > > > > > > > > > > > > (most recent call last): > > > > > > > > > > > > > > > > > > Apr 05 20:33:48 cn03.ceph.xyz.corp > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ceph-95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1-mgr-cn03-negzvb[307291]: > > > > > > > > > > > > > > > > > > File > > > > > "/usr/share/ceph/mgr/cephadm/module.py", line 470, > > > > > > > > > > > > > in __init__ > > > > > > > > > > > > > > > > > > Apr 05 20:33:48 cn03.ceph.xyz.corp > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ceph-95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1-mgr-cn03-negzvb[307291]: > > > > > > > > > > > > > > > > > > self.upgrade = CephadmUpgrade(self) > > > > > > > > > > > > > > > > > > Apr 05 20:33:48 cn03.ceph.xyz.corp > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ceph-95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1-mgr-cn03-negzvb[307291]: > > > > > > > > > > > > > > > > > > File > > > > > "/usr/share/ceph/mgr/cephadm/upgrade.py", line 112, > > > > > > > > > > > > > in __init__ > > > > > > > > > > > > > > > > > > Apr 05 20:33:48 cn03.ceph.xyz.corp > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ceph-95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1-mgr-cn03-negzvb[307291]: > > > > > > > > > > > > > > > > > > self.upgrade_state: Optional[UpgradeState] = > > > > > > > > > > > > > > > > > > UpgradeState.from_json(json.loads(t)) > > > > > > > > > > > > > > > > > > Apr 05 20:33:48 cn03.ceph.xyz.corp > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ceph-95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1-mgr-cn03-negzvb[307291]: > > > > > > > > > > > > > > > > > > File > > > > > "/usr/share/ceph/mgr/cephadm/upgrade.py", line 93, > > > > > > > > > > > > > in from_json > > > > > > > > > > > > > > > > > > Apr 05 20:33:48 cn03.ceph.xyz.corp > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ceph-95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1-mgr-cn03-negzvb[307291]: > > > > > > > > > > > > > > > > > > return cls(**c) > > > > > > > > > > > > > > > > > > Apr 05 20:33:48 cn03.ceph.xyz.corp > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ceph-95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1-mgr-cn03-negzvb[307291]: > > > > > > > > > > > > > > > > > > TypeError: __init__() got an unexpected > > > > > keyword argument > > > > > > > > > > > > > > > 'daemon_types' > > > > > > > > > > > > > > > > > > Apr 05 20:33:48 cn03.ceph.xyz.corp > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ceph-95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1-mgr-cn03-negzvb[307291]: > > > > > > > > > > > > > > > > > > Apr 05 20:33:48 cn03.ceph.xyz.corp > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ceph-95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1-mgr-cn03-negzvb[307291]: > > > > > > > > > > > > > > > > > > debug 2025-04-05T20:33:48.918+0000 > > > 7f2663400700 -1 > > > > > > > > > > > mgr operator() > > > > > > > > > > > > > > > > > > Failed to run module in active mode > > > ('cephadm') > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Apr 05 20:33:49 cn03.ceph.xyz.corp > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ceph-95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1-mgr-cn03-negzvb[307291]: > > > > > > > > > > > > > > > > > > debug 2025-04-05T20:33:49.273+0000 > > > > > 7f2663400700 -1 mgr > > > > > > > > > > > > > load Failed to > > > > > > > > > > > > > > > > > > construct class in 'snap_schedule' > > > > > > > > > > > > > > > > > > Apr 05 20:33:49 cn03.ceph.xyz.corp > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ceph-95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1-mgr-cn03-negzvb[307291]: > > > > > > > > > > > > > > > > > > debug 2025-04-05T20:33:49.273+0000 > > > > > 7f2663400700 -1 mgr > > > > > > > > > > > > > load Traceback > > > > > > > > > > > > > > > > > > (most recent call last): > > > > > > > > > > > > > > > > > > Apr 05 20:33:49 cn03.ceph.xyz.corp > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ceph-95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1-mgr-cn03-negzvb[307291]: > > > > > > > > > > > > > > > > > > File > > > > > > > > > "/usr/share/ceph/mgr/snap_schedule/module.py", line 38, > > > > > > > > > > > > > > > in __init__ > > > > > > > > > > > > > > > > > > Apr 05 20:33:49 cn03.ceph.xyz.corp > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ceph-95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1-mgr-cn03-negzvb[307291]: > > > > > > > > > > > > > > > > > > self.client = SnapSchedClient(self) > > > > > > > > > > > > > > > > > > Apr 05 20:33:49 cn03.ceph.xyz.corp > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ceph-95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1-mgr-cn03-negzvb[307291]: > > > > > > > > > > > > > > > > > > File > > > > > > > > > > > > > > > > > > "/usr/share/ceph/mgr/snap_schedule/fs/schedule_client.py", line > > > > > > > > > > > > > > > > > > 158, in __init__ > > > > > > > > > > > > > > > > > > Apr 05 20:33:49 cn03.ceph.xyz.corp > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ceph-95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1-mgr-cn03-negzvb[307291]: > > > > > > > > > > > > > > > > > > with self.get_schedule_db(fs_name) as > > > conn_mgr: > > > > > > > > > > > > > > > > > > Apr 05 20:33:49 cn03.ceph.xyz.corp > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ceph-95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1-mgr-cn03-negzvb[307291]: > > > > > > > > > > > > > > > > > > File > > > > > > > > > > > > > > > > > > "/usr/share/ceph/mgr/snap_schedule/fs/schedule_client.py", line > > > > > > > > > > > > > > > > > > 192, in get_schedule_db > > > > > > > > > > > > > > > > > > Apr 05 20:33:49 cn03.ceph.xyz.corp > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ceph-95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1-mgr-cn03-negzvb[307291]: > > > > > > > > > > > > > > > > > > db.executescript(dump) > > > > > > > > > > > > > > > > > > Apr 05 20:33:49 cn03.ceph.xyz.corp > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ceph-95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1-mgr-cn03-negzvb[307291]: > > > > > > > > > > > > > > > > > > sqlite3.OperationalError: table schedules > > > > > already exists > > > > > > > > > > > > > > > > > > Apr 05 20:33:49 cn03.ceph.xyz.corp > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ceph-95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1-mgr-cn03-negzvb[307291]: > > > > > > > > > > > > > > > > > > Apr 05 20:33:49 cn03.ceph.xyz.corp > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ceph-95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1-mgr-cn03-negzvb[307291]: > > > > > > > > > > > > > > > > > > debug 2025-04-05T20:33:49.274+0000 > > > 7f2663400700 -1 > > > > > > > > > > > mgr operator() > > > > > > > > > > > > > > > > > > Failed to run module in active mode > > > > > ('snap_schedule') > > > > > > > > > > > > > > > > > > ---snip--- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Your cluster seems to be in an error > > > > > state (ceph -s) > > > > > > > > > > > because of an > > > > > > > > > > > > > > > > > > unknown PG. It's recommended to have a > > > > > > > > > > > > > > > > > > healthy > > > > > > > > > cluster before > > > > > > > > > > > > > > > > > > attemping an upgrade. It's possible that > > > > > these errors > > > > > > > > > > > > > come from the > > > > > > > > > > > > > > > > > > not upgraded MGR, I'm not sure. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Since the upgrade was only successful for > > > > > two MGRs, I > > > > > > > > > > > am thinking > > > > > > > > > > > > > > > > > > about downgrading both MGRs back to > > > > > 16.2.15, then retry > > > > > > > > > > > > > an upgrade to > > > > > > > > > > > > > > > > > > a newer version, either 17.2.8 or 18.2.4. > > > > > I haven't > > > > > > > > > > > checked the > > > > > > > > > > > > > > > > > > snap_schedule error yet, though. Maybe > > > > > someone else knows > > > > > > > > > > > > > > > that already. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
signature.asc
Description: PGP signature
_______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io