Hi Anthony,
No we use the same mons (that are also backed up by UPS/Diesel). The
idea doing this was to allow sharing the OSD between the pool using this
"critical area" (thus OSD located only in this row) and the other normal
pools, to avoid dedicated potentially a large storage volume to this
critical area that doesn't require much. Thus also the choice to
reweight the OSDs in this row so that they are less used than other OSDs
by normal pools to avoid exploding the number of PGs on these OSDs.
I am not sure that we can use a custom device class to achieve what we
had in mind as this will not allow to share an OSD between critical and
non critical pools. But it may be a better way in fact, dedicated only
a fraction of the OSDs on each server in the "critical row" to these
pools and using other OSDs on these servers for normal pools without any
reweighting. Thanks for the idea.
We may also try to bump the number of retries to see if it has an effect.
Best regards,
Michel
Le 16/04/2025 à 13:16, Anthony D'Atri a écrit :
First time I recall anyone trying this. Thoughts:
* Manually edit the crush map and bump retries from 50 to 100
* Better yet, give those OSDs a custom device class and change the CRUSH rule
to use that and the default root.
Do you also constrain mons to those systems ?
On Apr 16, 2025, at 6:41 AM, Michel Jouvin <michel.jou...@ijclab.in2p3.fr>
wrote:
Hi,
We have use case where we had like to restrict some pools to a subset of the
OSDs located in a particular section of the crush map hierarchy (OSDs backed up
by UPS/Diesel). We tried to define for these (replica 3) pools a specific crush
rule with the root paramater defined to a specific row (which contains 3 OSD
servers with #10 OSD each). At the beginning it worked but after some time
(probably after doing a reweight on the OSDs in this row to reduce the number
of PGs from other pools), a few PGs are active+clean+remapped and 1 is
undersized.
'ceph osd pg dump|grep remapped' gives an output similar to the following one
for each remapped PG:
20.1ae 648 0 0 648 0 374416846
0 0 438 1404 438 active+clean+remapped
2025-04-16T07:19:40.507778+0000 43117'1018433 48443:1131738 [70,58]
70 [70,58,45] 70 43117'1018433
2025-04-16T07:19:40.507443+0000 43117'1018433
2025-04-16T07:19:40.507443+0000 0 15 periodic scrub scheduled @
2025-04-17T19:18:23.470846+0000 648 0
We can see that we currently have 3 replica but that Ceph would like to move to
2... (the undersized PG has currently only 2 replica for an unknown reason,
probably the same).
Is it wrong trying to do what we did, i.e. using a row for the crush rule root
parameter? If not, where could we find more information about the cause?
Thanks in advance for any help. Best regards,
Michel
--------------------- Crush rule used -----------------
{
"rule_id": 2,
"rule_name": "ha-replicated_ruleset",
"type": 1,
"steps": [
{
"op": "take",
"item": -22,
"item_name": "row-01~hdd"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
}
------------------- Beginning of the CRUSH tree -------------------
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT
PRI-AFF
-1 843.57141 root default
-19 843.57141 datacenter bat.206
-21 283.81818 row row-01
-15 87.32867 host cephdevel-76079
1 hdd 7.27739 osd.1 up 0.50000
1.00000
2 hdd 7.27739 osd.2 up 0.50000
1.00000
14 hdd 7.27739 osd.14 up 0.50000
1.00000
39 hdd 7.27739 osd.39 up 0.50000
1.00000
40 hdd 7.27739 osd.40 up 0.50000
1.00000
41 hdd 7.27739 osd.41 up 0.50000
1.00000
42 hdd 7.27739 osd.42 up 0.50000
1.00000
43 hdd 7.27739 osd.43 up 0.50000
1.00000
44 hdd 7.27739 osd.44 up 0.50000
1.00000
45 hdd 7.27739 osd.45 up 0.50000
1.00000
46 hdd 7.27739 osd.46 up 0.50000
1.00000
47 hdd 7.27739 osd.47 up 0.50000
1.00000
-3 94.60606 host cephdevel-76154
49 hdd 7.27739 osd.49 up 0.50000
1.00000
50 hdd 7.27739 osd.50 up 0.50000
1.00000
51 hdd 7.27739 osd.51 up 0.50000
1.00000
66 hdd 7.27739 osd.66 up 0.50000
1.00000
67 hdd 7.27739 osd.67 up 0.50000
1.00000
68 hdd 7.27739 osd.68 up 0.50000
1.00000
69 hdd 7.27739 osd.69 up 0.50000
1.00000
70 hdd 7.27739 osd.70 up 0.50000
1.00000
71 hdd 7.27739 osd.71 up 0.50000
1.00000
72 hdd 7.27739 osd.72 up 0.50000
1.00000
73 hdd 7.27739 osd.73 up 0.50000
1.00000
74 hdd 7.27739 osd.74 up 0.50000
1.00000
75 hdd 7.27739 osd.75 up 0.50000
1.00000
-4 101.88345 host cephdevel-76204
48 hdd 7.27739 osd.48 up 0.50000
1.00000
52 hdd 7.27739 osd.52 up 0.50000
1.00000
53 hdd 7.27739 osd.53 up 0.50000
1.00000
54 hdd 7.27739 osd.54 up 0.50000
1.00000
56 hdd 7.27739 osd.56 up 0.50000
1.00000
57 hdd 7.27739 osd.57 up 0.50000
1.00000
58 hdd 7.27739 osd.58 up 0.50000
1.00000
59 hdd 7.27739 osd.59 up 0.50000
1.00000
60 hdd 7.27739 osd.60 up 0.50000
1.00000
61 hdd 7.27739 osd.61 up 0.50000
1.00000
62 hdd 7.27739 osd.62 up 0.50000
1.00000
63 hdd 7.27739 osd.63 up 0.50000
1.00000
64 hdd 7.27739 osd.64 up 0.50000
1.00000
65 hdd 7.27739 osd.65 up 0.50000
1.00000
-23 203.16110 row row-02
-13 87.32867 host cephdevel-76213
27 hdd 7.27739 osd.27 up 1.00000
1.00000
28 hdd 7.27739 osd.28 up 1.00000
1.00000
29 hdd 7.27739 osd.29 up 1.00000
1.00000
30 hdd 7.27739 osd.30 up 1.00000
1.00000
31 hdd 7.27739 osd.31 up 1.00000
1.00000
32 hdd 7.27739 osd.32 up 1.00000
1.00000
33 hdd 7.27739 osd.33 up 1.00000
1.00000
34 hdd 7.27739 osd.34 up 1.00000
1.00000
35 hdd 7.27739 osd.35 up 1.00000
1.00000
36 hdd 7.27739 osd.36 up 1.00000
1.00000
37 hdd 7.27739 osd.37 up 1.00000
1.00000
38 hdd 7.27739 osd.38 up 1.00000
1.00000
......
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io