I confirmed and can consistently replicate the failure event that forces the object-map rebuild.

If the VM is terminated cleanly, such as a hypervisor reboot, then the VMs and their rbd volumes are all well. If the hypervisor goes down hard, such as a hard power cycle, then any VMs on the hypervisor will have sufficient I/O errors to prevent either a boot or the root volume from mounting.   This usually manifests as
  "I/O error, dev sda, sector ......"

The object-map check when this happens appears clean:
[root@eda84984a767 /]# rbd object-map check proxmox/vm-179-disk-0
Object Map Check: 100% complete...done.

And to confirm, rebuilding the above object-map, then allows the VM to function correctly with no apparent
I/O error reports from the OS' kernel.

Is there something else I should be checking?   Could it be related to the rbd_invalidate_object_map_on_timeout setting on the pool?

[root@eda84984a767 /]# rbd config pool list proxmox |grep object
rbd_cache_max_dirty_object                           0 config
rbd_invalidate_object_map_on_timeout                 true config
rbd_journal_max_concurrent_object_sets               0 config
rbd_journal_object_flush_age                         0.000000 config
rbd_journal_object_flush_bytes                       1048576 config
rbd_journal_object_flush_interval                    0 config
rbd_journal_object_max_in_flight_appends             0 config
rbd_journal_object_writethrough_until_flush          true config

Cheers,
Gary


On 2025-06-26 09:33, Ilya Dryomov wrote:
On Tue, Jun 24, 2025 at 11:19 PM Gary Molenkamp <molen...@uwo.ca> wrote:
We use ceph rbd as a volume service for both an Openstack deployment and
a series of Proxmox servers. This ceph deployment started as a Hammer
release and has been upgraded over the years to where it is now running
Quincy.  It has been fairly solid over that time, even
through upgrades from filestore to bluestore, and many transparent
hardware replacements/improvements.

One concern we have is that when we have a hypervisor that unexpectedly
dies/crashes, the volumes must always have the object maps rebuilt.  If
we don't rebuild the object maps, the VMs will either not boot, or we
will have other side-effects that render the volume unusable. (ie cannot
mount root).   Is this to be expected during this type of event or have
I missed a setting during one of the many upgrade on our deployment?
Hi Gary,

It's definitely not expected.  Have you ever run "rbd object-map check"
command and captured its output before rebuilding the object map?  Some
object map inconsistencies following a hard crash are expected, but they
shouldn't be leading to the VM not booting/rootfs not mounting.

Thanks,

                 Ilya

--
Gary Molenkamp                  Science Technology Services
Systems Engineer                University of Western Ontario
molen...@uwo.ca                 http://sts.sci.uwo.ca
(519) 661-2111 x86882           (519) 661-3566
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to