On Thu, Jul 3, 2025 at 7:50 PM Gary Molenkamp <molen...@uwo.ca> wrote: > > Thanks to everyone that replied so far. > > During my debugging, I discovered that either an 'object-map rebuild' or > an 'object-map check' is sufficient to clear the conditions that are > preventing the volume from being used properly. > Is there something in common that the two command both do that could be > affecting the volume? ie clearing locks, state, flush cache, etc.
Breaking leftover locks would be my first guess. When you run any of these commands, do you use the same user entity as Proxmox uses for QEMU or the default (client.admin)? What is the output of "ceph auth get client.<what is used by Proxmox>" (edit out the base64-encoded key)? It could be that this user entity is missing the permission to blocklist the pre-crash lock owner. > > I checked for locks, watchers, etc when the volume was not usable, but > nothing evident. Are you saying that "rbd lock ls" on the image immediately after powering on the hypervisor produces no output? Thanks, Ilya > > Cheers, > Gary > > > > On 2025-06-26 9:33 a.m., Ilya Dryomov wrote: > > On Tue, Jun 24, 2025 at 11:19 PM Gary Molenkamp <molen...@uwo.ca> wrote: > >> We use ceph rbd as a volume service for both an Openstack deployment and > >> a series of Proxmox servers. This ceph deployment started as a Hammer > >> release and has been upgraded over the years to where it is now running > >> Quincy. It has been fairly solid over that time, even > >> through upgrades from filestore to bluestore, and many transparent > >> hardware replacements/improvements. > >> > >> One concern we have is that when we have a hypervisor that unexpectedly > >> dies/crashes, the volumes must always have the object maps rebuilt. If > >> we don't rebuild the object maps, the VMs will either not boot, or we > >> will have other side-effects that render the volume unusable. (ie cannot > >> mount root). Is this to be expected during this type of event or have > >> I missed a setting during one of the many upgrade on our deployment? > > Hi Gary, > > > > It's definitely not expected. Have you ever run "rbd object-map check" > > command and captured its output before rebuilding the object map? Some > > object map inconsistencies following a hard crash are expected, but they > > shouldn't be leading to the VM not booting/rootfs not mounting. > > > > Thanks, > > > > Ilya > > -- > Gary Molenkamp Science Technology Services > Systems Engineer University of Western Ontario > molen...@uwo.ca http://sts.sci.uwo.ca > (519) 661-2111 x86882 (519) 661-3566 > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io