We need to see `ceph osd crush rule dump` and `ceph osd pool ls detail` to see 
which pools are using which CRUSH rule.

Since there are two device classes, *every* pool should specify a CRUSH rule 
that constrains to one or the other device class.

If this is not done, e.g.

root@cmigsdsc-m18-33:~# ceph osd crush rule dump
[
    {
        "rule_id": 0,
        "rule_name": "replicated_rule",
        "type": 1,
        "steps": [
            {
                "op": "take",
                "item": -1,
                "item_name": “default” <——————————————==<<<<
            },
            {
                "op": "chooseleaf_firstn",
                "num": 0,
                "type": "host"
            },
            {
                "op": "emit"
            }
        ]
    },

then pools whose rule specifies an item_name “default” will be placed on both 
the “nvme” and “ssd” device classes.  Which sort of works, but is usually not 
the best approach, and may result in the balancer and pg autoscaler not working 
properly if at all.  Segregating the NVMe and SAS/SATA SSDs to separate pools 
is usually the better option since the former are usually faster.

Below is an example CRUSH rule that will constrain pools specifying it to only 
the `nvme` device class.

    {
        "rule_id": 6,
        "rule_name": "ssd_nvme_replicated",
        "type": 1,
        "steps": [
            {
                "op": "take",
                "item": -33,
                "item_name": “default~nvme"
            },
            {
                "op": "chooseleaf_firstn",
                "num": 0,
                "type": "host"
            },
            {
                "op": "emit"
            }
        ]
    },

If the pools are using only one or the other device class, weighting to favor 
the NVMe SSDs as such isn’t in scope.  Messing with CRUSH or legacy override 
reweights is more likely to just result in OSDs filling up prematurely.

I’ve counseled the OP re PGs, but I think the underlying concern here is that 
“utilization” is being interpreted as two different things, one of which is not 
meaningfully represented for SSDs by that graph that was shared.





> On Jul 2, 2025, at 11:59 AM, Peter Eisch <pe...@boku.net> wrote:
> 
> Would you consider increasing the pgs on your larger pool(s)?  You might find 
> things balance better.  Then you could look at weighting to favor NVMe if 
> needed.

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to