Hi *,
since the question came up yesterday on this list, I decided to share
our workaround. I created a tracker issue [0] as well as a blog post
[1] with some more details.
The current Prometheus module relies on uniform OSD sizes, which is
not very common, at least not from our experience or the reports on
this list.
So we added a new metric to the mgr prometheus module and modified the
alert expression which only compares OSDs of the same size (since the
crush weight is calculated similarly, we just called the metric
osd_crush_weight). This is a bit hacky, not persistent across updates
etc., but it has worked great so far. It would be great if the mgr
module could be improved. I'm sure there are more elegant ways to do
that, but with this approach we didn't need to introduce anything new,
just utilized what was already there.
Best regards,
Eugen
[0] https://tracker.ceph.com/issues/71310
[1] https://heiterbiswolkig.blogs.nde.ag/2025/05/13/cephadm-pg-imbalance/
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io