> > No we use the same mons (that are also backed up by UPS/Diesel). The idea > doing this was to allow sharing the OSD between the pool using this "critical > area" (thus OSD located only in this row) and the other normal pools, to > avoid dedicated potentially a large storage volume to this critical area that > doesn't require much.
Groovy. Since you were clearly targeting higher availability for this subset of data, I wanted to be sure that your efforts weren’t confounded by the potential for the mons to not reach quorum, which would make the CRUSH hoops moot. > Thus also the choice to reweight the OSDs in this row so that they are less > used than other OSDs by normal pools to avoid exploding the number of PGs on > these OSDs. > > I am not sure that we can use a custom device class to achieve what we had in > mind as this will not allow to share an OSD between critical and non critical > pools. The above two statements seem a bit at odds with each other. In the first you’re discouraging sharing and may fill up as your critical dataset grows; in the second you want to share. > But it may be a better way in fact, dedicated only a fraction of the OSDs on > each server in the "critical row" to these pools and using other OSDs on > these servers for normal pools without any reweighting. Thanks for the idea. You bet. It seems like a cleaner approach. You might consider a reclassify operation https://docs.ceph.com/en/latest/rados/operations/crush-map-edits/#migrating-from-a-legacy-ssd-rule-to-device-classes to update the CRUSHmap and rules at the same time. > > We may also try to bump the number of retries to see if it has an effect. > > Best regards, > > Michel > > Le 16/04/2025 à 13:16, Anthony D'Atri a écrit : >> First time I recall anyone trying this. Thoughts: >> >> * Manually edit the crush map and bump retries from 50 to 100 >> * Better yet, give those OSDs a custom device class and change the CRUSH >> rule to use that and the default root. >> >> Do you also constrain mons to those systems ? >> >>> On Apr 16, 2025, at 6:41 AM, Michel Jouvin <michel.jou...@ijclab.in2p3.fr> >>> wrote: >>> >>> Hi, >>> >>> We have use case where we had like to restrict some pools to a subset of >>> the OSDs located in a particular section of the crush map hierarchy (OSDs >>> backed up by UPS/Diesel). We tried to define for these (replica 3) pools a >>> specific crush rule with the root paramater defined to a specific row >>> (which contains 3 OSD servers with #10 OSD each). At the beginning it >>> worked but after some time (probably after doing a reweight on the OSDs in >>> this row to reduce the number of PGs from other pools), a few PGs are >>> active+clean+remapped and 1 is undersized. >>> >>> 'ceph osd pg dump|grep remapped' gives an output similar to the following >>> one for each remapped PG: >>> >>> 20.1ae 648 0 0 648 0 374416846 >>> 0 0 438 1404 438 active+clean+remapped >>> 2025-04-16T07:19:40.507778+0000 43117'1018433 48443:1131738 >>> [70,58] 70 [70,58,45] 70 43117'1018433 >>> 2025-04-16T07:19:40.507443+0000 43117'1018433 >>> 2025-04-16T07:19:40.507443+0000 0 15 periodic scrub scheduled >>> @ 2025-04-17T19:18:23.470846+0000 648 0 >>> >>> We can see that we currently have 3 replica but that Ceph would like to >>> move to 2... (the undersized PG has currently only 2 replica for an unknown >>> reason, probably the same). >>> >>> Is it wrong trying to do what we did, i.e. using a row for the crush rule >>> root parameter? If not, where could we find more information about the >>> cause? >>> >>> Thanks in advance for any help. Best regards, >>> >>> Michel >>> >>> --------------------- Crush rule used ----------------- >>> >>> { >>> "rule_id": 2, >>> "rule_name": "ha-replicated_ruleset", >>> "type": 1, >>> "steps": [ >>> { >>> "op": "take", >>> "item": -22, >>> "item_name": "row-01~hdd" >>> }, >>> { >>> "op": "chooseleaf_firstn", >>> "num": 0, >>> "type": "host" >>> }, >>> { >>> "op": "emit" >>> } >>> ] >>> } >>> >>> >>> ------------------- Beginning of the CRUSH tree ------------------- >>> >>> ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT >>> PRI-AFF >>> -1 843.57141 root default >>> -19 843.57141 datacenter bat.206 >>> -21 283.81818 row row-01 >>> -15 87.32867 host cephdevel-76079 >>> 1 hdd 7.27739 osd.1 up 0.50000 >>> 1.00000 >>> 2 hdd 7.27739 osd.2 up 0.50000 >>> 1.00000 >>> 14 hdd 7.27739 osd.14 up 0.50000 >>> 1.00000 >>> 39 hdd 7.27739 osd.39 up 0.50000 >>> 1.00000 >>> 40 hdd 7.27739 osd.40 up 0.50000 >>> 1.00000 >>> 41 hdd 7.27739 osd.41 up 0.50000 >>> 1.00000 >>> 42 hdd 7.27739 osd.42 up 0.50000 >>> 1.00000 >>> 43 hdd 7.27739 osd.43 up 0.50000 >>> 1.00000 >>> 44 hdd 7.27739 osd.44 up 0.50000 >>> 1.00000 >>> 45 hdd 7.27739 osd.45 up 0.50000 >>> 1.00000 >>> 46 hdd 7.27739 osd.46 up 0.50000 >>> 1.00000 >>> 47 hdd 7.27739 osd.47 up 0.50000 >>> 1.00000 >>> -3 94.60606 host cephdevel-76154 >>> 49 hdd 7.27739 osd.49 up 0.50000 >>> 1.00000 >>> 50 hdd 7.27739 osd.50 up 0.50000 >>> 1.00000 >>> 51 hdd 7.27739 osd.51 up 0.50000 >>> 1.00000 >>> 66 hdd 7.27739 osd.66 up 0.50000 >>> 1.00000 >>> 67 hdd 7.27739 osd.67 up 0.50000 >>> 1.00000 >>> 68 hdd 7.27739 osd.68 up 0.50000 >>> 1.00000 >>> 69 hdd 7.27739 osd.69 up 0.50000 >>> 1.00000 >>> 70 hdd 7.27739 osd.70 up 0.50000 >>> 1.00000 >>> 71 hdd 7.27739 osd.71 up 0.50000 >>> 1.00000 >>> 72 hdd 7.27739 osd.72 up 0.50000 >>> 1.00000 >>> 73 hdd 7.27739 osd.73 up 0.50000 >>> 1.00000 >>> 74 hdd 7.27739 osd.74 up 0.50000 >>> 1.00000 >>> 75 hdd 7.27739 osd.75 up 0.50000 >>> 1.00000 >>> -4 101.88345 host cephdevel-76204 >>> 48 hdd 7.27739 osd.48 up 0.50000 >>> 1.00000 >>> 52 hdd 7.27739 osd.52 up 0.50000 >>> 1.00000 >>> 53 hdd 7.27739 osd.53 up 0.50000 >>> 1.00000 >>> 54 hdd 7.27739 osd.54 up 0.50000 >>> 1.00000 >>> 56 hdd 7.27739 osd.56 up 0.50000 >>> 1.00000 >>> 57 hdd 7.27739 osd.57 up 0.50000 >>> 1.00000 >>> 58 hdd 7.27739 osd.58 up 0.50000 >>> 1.00000 >>> 59 hdd 7.27739 osd.59 up 0.50000 >>> 1.00000 >>> 60 hdd 7.27739 osd.60 up 0.50000 >>> 1.00000 >>> 61 hdd 7.27739 osd.61 up 0.50000 >>> 1.00000 >>> 62 hdd 7.27739 osd.62 up 0.50000 >>> 1.00000 >>> 63 hdd 7.27739 osd.63 up 0.50000 >>> 1.00000 >>> 64 hdd 7.27739 osd.64 up 0.50000 >>> 1.00000 >>> 65 hdd 7.27739 osd.65 up 0.50000 >>> 1.00000 >>> -23 203.16110 row row-02 >>> -13 87.32867 host cephdevel-76213 >>> 27 hdd 7.27739 osd.27 up 1.00000 >>> 1.00000 >>> 28 hdd 7.27739 osd.28 up 1.00000 >>> 1.00000 >>> 29 hdd 7.27739 osd.29 up 1.00000 >>> 1.00000 >>> 30 hdd 7.27739 osd.30 up 1.00000 >>> 1.00000 >>> 31 hdd 7.27739 osd.31 up 1.00000 >>> 1.00000 >>> 32 hdd 7.27739 osd.32 up 1.00000 >>> 1.00000 >>> 33 hdd 7.27739 osd.33 up 1.00000 >>> 1.00000 >>> 34 hdd 7.27739 osd.34 up 1.00000 >>> 1.00000 >>> 35 hdd 7.27739 osd.35 up 1.00000 >>> 1.00000 >>> 36 hdd 7.27739 osd.36 up 1.00000 >>> 1.00000 >>> 37 hdd 7.27739 osd.37 up 1.00000 >>> 1.00000 >>> 38 hdd 7.27739 osd.38 up 1.00000 >>> 1.00000 >>> ...... >>> _______________________________________________ >>> ceph-users mailing list -- ceph-users@ceph.io >>> To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io