On Thu, Jul 3, 2025 at 7:50 PM Gary Molenkamp <molen...@uwo.ca> wrote:
>
> Thanks to everyone that replied so far.
>
> During my debugging, I discovered that either an 'object-map rebuild' or
> an 'object-map check' is sufficient to clear the conditions that are
> preventing the volume from being used properly.
> Is there something in common that the two command both do that could be
> affecting the volume?  ie clearing locks, state, flush cache, etc.

Breaking leftover locks would be my first guess.  When you run any
of these commands, do you use the same user entity as Proxmox uses for
QEMU or the default (client.admin)?

What is the output of "ceph auth get client.<what is used by Proxmox>"
(edit out the base64-encoded key)?  It could be that this user entity
is missing the permission to blocklist the pre-crash lock owner.

>
> I checked for locks, watchers, etc when the volume was not usable, but
> nothing evident.

Are you saying that "rbd lock ls" on the image immediately after
powering on the hypervisor produces no output?

Thanks,

                Ilya

>
> Cheers,
> Gary
>
>
>
> On 2025-06-26 9:33 a.m., Ilya Dryomov wrote:
> > On Tue, Jun 24, 2025 at 11:19 PM Gary Molenkamp <molen...@uwo.ca> wrote:
> >> We use ceph rbd as a volume service for both an Openstack deployment and
> >> a series of Proxmox servers. This ceph deployment started as a Hammer
> >> release and has been upgraded over the years to where it is now running
> >> Quincy.  It has been fairly solid over that time, even
> >> through upgrades from filestore to bluestore, and many transparent
> >> hardware replacements/improvements.
> >>
> >> One concern we have is that when we have a hypervisor that unexpectedly
> >> dies/crashes, the volumes must always have the object maps rebuilt.  If
> >> we don't rebuild the object maps, the VMs will either not boot, or we
> >> will have other side-effects that render the volume unusable. (ie cannot
> >> mount root).   Is this to be expected during this type of event or have
> >> I missed a setting during one of the many upgrade on our deployment?
> > Hi Gary,
> >
> > It's definitely not expected.  Have you ever run "rbd object-map check"
> > command and captured its output before rebuilding the object map?  Some
> > object map inconsistencies following a hard crash are expected, but they
> > shouldn't be leading to the VM not booting/rootfs not mounting.
> >
> > Thanks,
> >
> >                  Ilya
>
> --
> Gary Molenkamp                  Science Technology Services
> Systems Engineer                University of Western Ontario
> molen...@uwo.ca                 http://sts.sci.uwo.ca
> (519) 661-2111 x86882           (519) 661-3566
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to