Hello Anthony, thank you for your answer. I forgot to mention the version. It's Luminous (12.2.9), the clients are OpenStack (Queens) VMs.
Kind regards, Laszlo On 7/30/20 8:59 PM, Anthony D'Atri wrote: > This is a natural condition of CRUSH. You don’t mention what release the > back-end or the clients are running so it’s difficult to give an exact answer. > > Don’t mess with the CRUSH weights. > > Either adjust the override / reweights with `ceph osd > test-reweight-by-utilization / reweight-by-utilization` > > https://docs.ceph.com/docs/master/rados/operations/control/ > > > or use the balancer module in newer releases *iff* all clients are new enough > to handle pg-upmap > > https://docs.ceph.com/docs/nautilus/rados/operations/balancer/ > > > > > > >> On Jul 30, 2020, at 9:21 AM, Budai Laszlo <laszlo.bu...@gmail.com> wrote: >> >> Dear all, >> >> We have a ceph cluster where we are have configured two SSD only pools in >> order to use them as cache tier for the spinning discs. Altogether there are >> 27 SSDs organized on 9 hosts distributed in 3 chassis. The hierarchy looks >> like this: >> >> $ ceph osd df tree | grep -E 'ssd|ID' >> ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS TYPE >> NAME >> -40 8.26199 - 8.26TiB 5.78TiB 2.48TiB 70.02 5.77 - root >> ssd-root >> -50 2.75400 - 2.75TiB 1.93TiB 845GiB 70.02 5.77 - >> chassis c1-ssd >> -41 0.91800 - 940GiB 651GiB 289GiB 69.23 5.71 - >> host c1-h01-ssd >> 110 ssd 0.30600 1.00000 313GiB 199GiB 115GiB 63.37 5.22 77 >> osd.110 >> 116 ssd 0.30600 1.00000 313GiB 219GiB 94.3GiB 69.91 5.76 89 >> osd.116 >> 119 ssd 0.30600 1.00000 313GiB 233GiB 80.2GiB 74.41 6.13 87 >> osd.119 >> -42 0.91800 - 940GiB 701GiB 239GiB 74.61 6.15 - >> host c1-h02-ssd >> 112 ssd 0.30600 1.00000 313GiB 228GiB 84.9GiB 72.91 6.01 85 >> osd.112 >> 117 ssd 0.30600 1.00000 313GiB 245GiB 67.9GiB 78.32 6.46 97 >> osd.117 >> 122 ssd 0.30600 1.00000 313GiB 227GiB 85.8GiB 72.61 5.99 87 >> osd.122 >> -43 0.91800 - 940GiB 622GiB 318GiB 66.21 5.46 - >> host c1-h03-ssd >> 109 ssd 0.30600 1.00000 313GiB 192GiB 122GiB 61.15 5.04 77 >> osd.109 >> 115 ssd 0.30600 1.00000 313GiB 206GiB 107GiB 65.79 5.42 79 >> osd.115 >> 120 ssd 0.30600 1.00000 313GiB 225GiB 88.7GiB 71.70 5.91 90 >> osd.120 >> -51 2.75400 - 2.75TiB 1.93TiB 845GiB 70.02 5.77 - >> chassis c2-ssd >> -46 0.91800 - 940GiB 651GiB 288GiB 69.31 5.71 - >> host c2-h01-ssd >> 125 ssd 0.30600 1.00000 313GiB 211GiB 103GiB 67.22 5.54 81 >> osd.125 >> 130 ssd 0.30600 1.00000 313GiB 233GiB 80.4GiB 74.33 6.13 89 >> osd.130 >> 132 ssd 0.30600 1.00000 313GiB 208GiB 105GiB 66.38 5.47 79 >> osd.132 >> -45 0.91800 - 940GiB 672GiB 267GiB 71.54 5.90 - >> host c2-h02-ssd >> 126 ssd 0.30600 1.00000 313GiB 216GiB 97.4GiB 68.90 5.68 87 >> osd.126 >> 129 ssd 0.30600 1.00000 313GiB 207GiB 106GiB 66.12 5.45 80 >> osd.129 >> 134 ssd 0.30600 1.00000 313GiB 249GiB 63.9GiB 79.61 6.56 99 >> osd.134 >> -44 0.91800 - 940GiB 650GiB 289GiB 69.20 5.70 - >> host c2-h03-ssd >> 123 ssd 0.30600 1.00000 313GiB 201GiB 112GiB 64.23 5.29 76 >> osd.123 >> 127 ssd 0.30600 1.00000 313GiB 217GiB 96.1GiB 69.31 5.71 85 >> osd.127 >> 131 ssd 0.30600 1.00000 313GiB 232GiB 81.2GiB 74.06 6.11 92 >> osd.131 >> -52 2.75400 - 2.75TiB 1.93TiB 845GiB 70.02 5.77 - >> chassis c3-ssd >> -47 0.91800 - 940GiB 628GiB 311GiB 66.86 5.51 - >> host c3-h01-ssd >> 124 ssd 0.30600 1.00000 313GiB 204GiB 109GiB 65.13 5.37 78 >> osd.124 >> 128 ssd 0.30600 1.00000 313GiB 202GiB 111GiB 64.59 5.32 76 >> osd.128 >> 133 ssd 0.30600 1.00000 313GiB 222GiB 91.3GiB 70.86 5.84 86 >> osd.133 >> -48 0.91800 - 940GiB 628GiB 312GiB 66.80 5.51 - >> host c3-h02-ssd >> 108 ssd 0.30600 1.00000 313GiB 220GiB 92.9GiB 70.35 5.80 86 >> osd.108 >> 114 ssd 0.30600 1.00000 313GiB 209GiB 105GiB 66.58 5.49 82 >> osd.114 >> 121 ssd 0.30600 1.00000 313GiB 199GiB 114GiB 63.46 5.23 79 >> osd.121 >> -49 0.91800 - 940GiB 718GiB 222GiB 76.40 6.30 - >> host c3-h03-ssd >> 111 ssd 0.30600 1.00000 313GiB 219GiB 94.4GiB 69.87 5.76 84 >> osd.111 >> 113 ssd 0.30600 1.00000 313GiB 241GiB 72.2GiB 76.95 6.34 96 >> osd.113 >> 118 ssd 0.30600 1.00000 313GiB 258GiB 55.2GiB 82.39 6.79 101 >> osd.118 >> >> >> The rule used for the two pools is the following: >> >> { >> "rule_id": 1, >> "rule_name": "ssd", >> "ruleset": 1, >> "type": 1, >> "min_size": 1, >> "max_size": 10, >> "steps": [ >> { >> "op": "take", >> "item": -40, >> "item_name": "ssd-root" >> }, >> { >> "op": "chooseleaf_firstn", >> "num": 0, >> "type": "chassis" >> }, >> { >> "op": "emit" >> } >> ] >> } >> >> >> both pools have the size 3, and the total number of PGs is 768 (256+512). >> >> As you can see from the previous table (the PG column) there is a >> significant difference between the OSD with the largest number of PGs >> (101PGs on osd.118) and the ones with the smallest number (76 PGs on >> osd.123). The ratio between the two is 1.32. So OSD 118 has more chances to >> receive data then OSD 123, and we can see that indeed osd.118 is the one >> storing the most data (82.39% full in the above table). >> >> I would like to re balance the PG/OSD allocation. I know that I can play >> around with the OSD weights (currently .306 for all the OSDs), but I wonder >> if there is any drawback for this on the long run? Are you aware of any >> reason why I should NOT modify the weights (and leave those modifications >> permanent)? >> >> Any ideas are welcome :) >> >> Kind regards, >> Laszlo >> _______________________________________________ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io > _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io