If you have thousands of RBD volumes, the mgr might get bogged down. Turn it on for your pool of interest while you watch and see what happens.
Re a nightly latency issue, does it correspond with a spike in read/write throughput? When I hear of a nightly perf issue, the first thing I have to ask is if you have [m]locate disabled on your clients. Cron jobs to update the DB can have this effect, especially if the guest OS doesn't add a random slew to the invocation - and of course if the backing pool is on spinners. Another idea is if you have deep scrubs limited to certain hours., or very verbose logs on the clients that are being rotated and compressed at the same time. > On Sep 16, 2025, at 4:48 AM, Mika <msch...@cyberfusion.io> wrote: > > Is there more information on "performance reasons" [0]? > > > Elsewhere in the documentation [1] it even states: > > > > Monitoring of RBD images is disabled by default, as it can significantly > > impact performance. > > > We want to use this in production to hopefully help with debugging a rare > nightly latency issue. Obviously, we don't want to introduce new performance > issues. > > > Does anyone have experience with enabling RBD per-image I/O statistics [2] in > production? Did it cause you any performance (or other) issues? > > > Mika > > > [0] > > https://docs.ceph.com/en/latest/cephadm/services/monitoring/#setting-up-rbd-image-monitoring > > [1] > > https://docs.ceph.com/en/latest/mgr/dashboard/#enabling-rbd-image-monitoring > > [2] > > https://docs.ceph.com/en/latest/mgr/prometheus/#rbd-io-statistics > > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io