First time I recall anyone trying this. Thoughts: * Manually edit the crush map and bump retries from 50 to 100 * Better yet, give those OSDs a custom device class and change the CRUSH rule to use that and the default root.
Do you also constrain mons to those systems ? > On Apr 16, 2025, at 6:41 AM, Michel Jouvin <michel.jou...@ijclab.in2p3.fr> > wrote: > > Hi, > > We have use case where we had like to restrict some pools to a subset of the > OSDs located in a particular section of the crush map hierarchy (OSDs backed > up by UPS/Diesel). We tried to define for these (replica 3) pools a specific > crush rule with the root paramater defined to a specific row (which contains > 3 OSD servers with #10 OSD each). At the beginning it worked but after some > time (probably after doing a reweight on the OSDs in this row to reduce the > number of PGs from other pools), a few PGs are active+clean+remapped and 1 is > undersized. > > 'ceph osd pg dump|grep remapped' gives an output similar to the following one > for each remapped PG: > > 20.1ae 648 0 0 648 0 374416846 > 0 0 438 1404 438 active+clean+remapped > 2025-04-16T07:19:40.507778+0000 43117'1018433 48443:1131738 [70,58] > 70 [70,58,45] 70 43117'1018433 > 2025-04-16T07:19:40.507443+0000 43117'1018433 > 2025-04-16T07:19:40.507443+0000 0 15 periodic scrub scheduled @ > 2025-04-17T19:18:23.470846+0000 648 0 > > We can see that we currently have 3 replica but that Ceph would like to move > to 2... (the undersized PG has currently only 2 replica for an unknown > reason, probably the same). > > Is it wrong trying to do what we did, i.e. using a row for the crush rule > root parameter? If not, where could we find more information about the cause? > > Thanks in advance for any help. Best regards, > > Michel > > --------------------- Crush rule used ----------------- > > { > "rule_id": 2, > "rule_name": "ha-replicated_ruleset", > "type": 1, > "steps": [ > { > "op": "take", > "item": -22, > "item_name": "row-01~hdd" > }, > { > "op": "chooseleaf_firstn", > "num": 0, > "type": "host" > }, > { > "op": "emit" > } > ] > } > > > ------------------- Beginning of the CRUSH tree ------------------- > > ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT > PRI-AFF > -1 843.57141 root default > -19 843.57141 datacenter bat.206 > -21 283.81818 row row-01 > -15 87.32867 host cephdevel-76079 > 1 hdd 7.27739 osd.1 up 0.50000 > 1.00000 > 2 hdd 7.27739 osd.2 up 0.50000 > 1.00000 > 14 hdd 7.27739 osd.14 up 0.50000 > 1.00000 > 39 hdd 7.27739 osd.39 up 0.50000 > 1.00000 > 40 hdd 7.27739 osd.40 up 0.50000 > 1.00000 > 41 hdd 7.27739 osd.41 up 0.50000 > 1.00000 > 42 hdd 7.27739 osd.42 up 0.50000 > 1.00000 > 43 hdd 7.27739 osd.43 up 0.50000 > 1.00000 > 44 hdd 7.27739 osd.44 up 0.50000 > 1.00000 > 45 hdd 7.27739 osd.45 up 0.50000 > 1.00000 > 46 hdd 7.27739 osd.46 up 0.50000 > 1.00000 > 47 hdd 7.27739 osd.47 up 0.50000 > 1.00000 > -3 94.60606 host cephdevel-76154 > 49 hdd 7.27739 osd.49 up 0.50000 > 1.00000 > 50 hdd 7.27739 osd.50 up 0.50000 > 1.00000 > 51 hdd 7.27739 osd.51 up 0.50000 > 1.00000 > 66 hdd 7.27739 osd.66 up 0.50000 > 1.00000 > 67 hdd 7.27739 osd.67 up 0.50000 > 1.00000 > 68 hdd 7.27739 osd.68 up 0.50000 > 1.00000 > 69 hdd 7.27739 osd.69 up 0.50000 > 1.00000 > 70 hdd 7.27739 osd.70 up 0.50000 > 1.00000 > 71 hdd 7.27739 osd.71 up 0.50000 > 1.00000 > 72 hdd 7.27739 osd.72 up 0.50000 > 1.00000 > 73 hdd 7.27739 osd.73 up 0.50000 > 1.00000 > 74 hdd 7.27739 osd.74 up 0.50000 > 1.00000 > 75 hdd 7.27739 osd.75 up 0.50000 > 1.00000 > -4 101.88345 host cephdevel-76204 > 48 hdd 7.27739 osd.48 up 0.50000 > 1.00000 > 52 hdd 7.27739 osd.52 up 0.50000 > 1.00000 > 53 hdd 7.27739 osd.53 up 0.50000 > 1.00000 > 54 hdd 7.27739 osd.54 up 0.50000 > 1.00000 > 56 hdd 7.27739 osd.56 up 0.50000 > 1.00000 > 57 hdd 7.27739 osd.57 up 0.50000 > 1.00000 > 58 hdd 7.27739 osd.58 up 0.50000 > 1.00000 > 59 hdd 7.27739 osd.59 up 0.50000 > 1.00000 > 60 hdd 7.27739 osd.60 up 0.50000 > 1.00000 > 61 hdd 7.27739 osd.61 up 0.50000 > 1.00000 > 62 hdd 7.27739 osd.62 up 0.50000 > 1.00000 > 63 hdd 7.27739 osd.63 up 0.50000 > 1.00000 > 64 hdd 7.27739 osd.64 up 0.50000 > 1.00000 > 65 hdd 7.27739 osd.65 up 0.50000 > 1.00000 > -23 203.16110 row row-02 > -13 87.32867 host cephdevel-76213 > 27 hdd 7.27739 osd.27 up 1.00000 > 1.00000 > 28 hdd 7.27739 osd.28 up 1.00000 > 1.00000 > 29 hdd 7.27739 osd.29 up 1.00000 > 1.00000 > 30 hdd 7.27739 osd.30 up 1.00000 > 1.00000 > 31 hdd 7.27739 osd.31 up 1.00000 > 1.00000 > 32 hdd 7.27739 osd.32 up 1.00000 > 1.00000 > 33 hdd 7.27739 osd.33 up 1.00000 > 1.00000 > 34 hdd 7.27739 osd.34 up 1.00000 > 1.00000 > 35 hdd 7.27739 osd.35 up 1.00000 > 1.00000 > 36 hdd 7.27739 osd.36 up 1.00000 > 1.00000 > 37 hdd 7.27739 osd.37 up 1.00000 > 1.00000 > 38 hdd 7.27739 osd.38 up 1.00000 > 1.00000 > ...... > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io