[ceph-users] Re: unbalanced pg/osd allocation

Budai Laszlo Thu, 30 Jul 2020 22:28:43 -0700

Hello Anthony,

thank you for your answer. I forgot to mention the version. It's Luminous 
(12.2.9), the clients are OpenStack (Queens) VMs.


Kind regards,
Laszlo

On 7/30/20 8:59 PM, Anthony D'Atri wrote:
> This is a natural condition of CRUSH.  You don’t mention what release the 
> back-end or the clients are running so it’s difficult to give an exact answer.
> 
> Don’t mess with the CRUSH weights.
> 
> Either adjust the override / reweights with `ceph osd 
> test-reweight-by-utilization / reweight-by-utilization`
> 
> https://docs.ceph.com/docs/master/rados/operations/control/
> 
> 
> or use the balancer module in newer releases *iff* all clients are new enough 
> to handle pg-upmap
> 
> https://docs.ceph.com/docs/nautilus/rados/operations/balancer/
> 
> 
> 
> 
> 
> 
>> On Jul 30, 2020, at 9:21 AM, Budai Laszlo <laszlo.bu...@gmail.com> wrote:
>>
>> Dear all,
>>
>> We have a ceph cluster where we are have configured two SSD only pools in 
>> order to use them as cache tier for the spinning discs. Altogether there are 
>> 27 SSDs organized on 9 hosts distributed in 3 chassis. The hierarchy looks 
>> like this:
>>
>> $ ceph osd df tree | grep -E 'ssd|ID'
>> ID  CLASS WEIGHT    REWEIGHT SIZE    USE     AVAIL   %USE  VAR  PGS TYPE 
>> NAME                   
>> -40         8.26199        - 8.26TiB 5.78TiB 2.48TiB 70.02 5.77   - root 
>> ssd-root               
>> -50         2.75400        - 2.75TiB 1.93TiB  845GiB 70.02 5.77   -     
>> chassis c1-ssd          
>> -41         0.91800        -  940GiB  651GiB  289GiB 69.23 5.71   -         
>> host c1-h01-ssd 
>> 110   ssd   0.30600  1.00000  313GiB  199GiB  115GiB 63.37 5.22  77          
>>    osd.110         
>> 116   ssd   0.30600  1.00000  313GiB  219GiB 94.3GiB 69.91 5.76  89          
>>    osd.116         
>> 119   ssd   0.30600  1.00000  313GiB  233GiB 80.2GiB 74.41 6.13  87          
>>    osd.119         
>> -42         0.91800        -  940GiB  701GiB  239GiB 74.61 6.15   -         
>> host c1-h02-ssd 
>> 112   ssd   0.30600  1.00000  313GiB  228GiB 84.9GiB 72.91 6.01  85          
>>    osd.112         
>> 117   ssd   0.30600  1.00000  313GiB  245GiB 67.9GiB 78.32 6.46  97          
>>    osd.117         
>> 122   ssd   0.30600  1.00000  313GiB  227GiB 85.8GiB 72.61 5.99  87          
>>    osd.122         
>> -43         0.91800        -  940GiB  622GiB  318GiB 66.21 5.46   -         
>> host c1-h03-ssd 
>> 109   ssd   0.30600  1.00000  313GiB  192GiB  122GiB 61.15 5.04  77          
>>    osd.109         
>> 115   ssd   0.30600  1.00000  313GiB  206GiB  107GiB 65.79 5.42  79          
>>    osd.115         
>> 120   ssd   0.30600  1.00000  313GiB  225GiB 88.7GiB 71.70 5.91  90          
>>    osd.120         
>> -51         2.75400        - 2.75TiB 1.93TiB  845GiB 70.02 5.77   -     
>> chassis c2-ssd          
>> -46         0.91800        -  940GiB  651GiB  288GiB 69.31 5.71   -         
>> host c2-h01-ssd 
>> 125   ssd   0.30600  1.00000  313GiB  211GiB  103GiB 67.22 5.54  81          
>>    osd.125         
>> 130   ssd   0.30600  1.00000  313GiB  233GiB 80.4GiB 74.33 6.13  89          
>>    osd.130         
>> 132   ssd   0.30600  1.00000  313GiB  208GiB  105GiB 66.38 5.47  79          
>>    osd.132         
>> -45         0.91800        -  940GiB  672GiB  267GiB 71.54 5.90   -         
>> host c2-h02-ssd 
>> 126   ssd   0.30600  1.00000  313GiB  216GiB 97.4GiB 68.90 5.68  87          
>>    osd.126         
>> 129   ssd   0.30600  1.00000  313GiB  207GiB  106GiB 66.12 5.45  80          
>>    osd.129         
>> 134   ssd   0.30600  1.00000  313GiB  249GiB 63.9GiB 79.61 6.56  99          
>>    osd.134         
>> -44         0.91800        -  940GiB  650GiB  289GiB 69.20 5.70   -         
>> host c2-h03-ssd 
>> 123   ssd   0.30600  1.00000  313GiB  201GiB  112GiB 64.23 5.29  76          
>>    osd.123         
>> 127   ssd   0.30600  1.00000  313GiB  217GiB 96.1GiB 69.31 5.71  85          
>>    osd.127         
>> 131   ssd   0.30600  1.00000  313GiB  232GiB 81.2GiB 74.06 6.11  92          
>>    osd.131         
>> -52         2.75400        - 2.75TiB 1.93TiB  845GiB 70.02 5.77   -     
>> chassis c3-ssd          
>> -47         0.91800        -  940GiB  628GiB  311GiB 66.86 5.51   -         
>> host c3-h01-ssd 
>> 124   ssd   0.30600  1.00000  313GiB  204GiB  109GiB 65.13 5.37  78          
>>    osd.124         
>> 128   ssd   0.30600  1.00000  313GiB  202GiB  111GiB 64.59 5.32  76          
>>    osd.128         
>> 133   ssd   0.30600  1.00000  313GiB  222GiB 91.3GiB 70.86 5.84  86          
>>    osd.133         
>> -48         0.91800        -  940GiB  628GiB  312GiB 66.80 5.51   -         
>> host c3-h02-ssd 
>> 108   ssd   0.30600  1.00000  313GiB  220GiB 92.9GiB 70.35 5.80  86          
>>    osd.108         
>> 114   ssd   0.30600  1.00000  313GiB  209GiB  105GiB 66.58 5.49  82          
>>    osd.114         
>> 121   ssd   0.30600  1.00000  313GiB  199GiB  114GiB 63.46 5.23  79          
>>    osd.121         
>> -49         0.91800        -  940GiB  718GiB  222GiB 76.40 6.30   -         
>> host c3-h03-ssd 
>> 111   ssd   0.30600  1.00000  313GiB  219GiB 94.4GiB 69.87 5.76  84          
>>    osd.111         
>> 113   ssd   0.30600  1.00000  313GiB  241GiB 72.2GiB 76.95 6.34  96          
>>    osd.113         
>> 118   ssd   0.30600  1.00000  313GiB  258GiB 55.2GiB 82.39 6.79 101          
>>    osd.118
>>
>>
>> The rule used for the two pools is the following:
>>
>>        {
>>            "rule_id": 1,
>>            "rule_name": "ssd",
>>            "ruleset": 1,
>>            "type": 1,
>>            "min_size": 1,
>>            "max_size": 10,
>>            "steps": [
>>                {
>>                    "op": "take",
>>                    "item": -40,
>>                    "item_name": "ssd-root"
>>                },
>>                {
>>                    "op": "chooseleaf_firstn",
>>                    "num": 0,
>>                    "type": "chassis"
>>                },
>>                {
>>                    "op": "emit"
>>                }
>>            ]
>>        }
>>
>>
>> both pools have the size 3, and the total number of PGs is 768 (256+512). 
>>
>> As you can see from the previous table (the PG column) there is a 
>> significant difference between the OSD with the largest number of PGs 
>> (101PGs on osd.118) and the ones with the smallest number (76 PGs on 
>> osd.123). The ratio between the two is 1.32. So OSD 118 has more chances to 
>> receive data then OSD 123, and we can see that indeed osd.118 is the one 
>> storing the most data (82.39% full in the above table).
>>
>> I would like to re balance the PG/OSD allocation. I know that I can play 
>> around with the OSD weights (currently .306 for all the OSDs), but I wonder 
>> if there is any drawback for this on the long run? Are you aware of any 
>> reason why I should NOT modify the weights (and leave those modifications 
>> permanent)?
>>
>> Any ideas are welcome :)
>>
>> Kind regards,
>> Laszlo
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: unbalanced pg/osd allocation

Reply via email to