Hi *,

since the question came up yesterday on this list, I decided to share our workaround. I created a tracker issue [0] as well as a blog post [1] with some more details. The current Prometheus module relies on uniform OSD sizes, which is not very common, at least not from our experience or the reports on this list.

So we added a new metric to the mgr prometheus module and modified the alert expression which only compares OSDs of the same size (since the crush weight is calculated similarly, we just called the metric osd_crush_weight). This is a bit hacky, not persistent across updates etc., but it has worked great so far. It would be great if the mgr module could be improved. I'm sure there are more elegant ways to do that, but with this approach we didn't need to introduce anything new, just utilized what was already there.

Best regards,
Eugen

[0] https://tracker.ceph.com/issues/71310
[1] https://heiterbiswolkig.blogs.nde.ag/2025/05/13/cephadm-pg-imbalance/
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to