> 
> Now 5 hours later - since i started this draft - I added:
> -> $ ceph osd reweight-by-utilization

This is a legacy that is mostly obviated by the upmap balancer. It is best to 
not use this and leave all REWEIGHT values 1.00000 as it and the upmap balancer 
don’t play well together.


> and before that cmd above I also noticed:
> -> $ ceph config get mgr mgr/balancer/begin_weekday
> 0
> -> $ ceph config get mgr mgr/balancer/end_weekday
> 0
> which was done by 'deployment' process - cephadm bootstrap - and made me 
> wonder:
> does that mean that auto-rebalance runs only on Sunday?

By default it runs all the time.


> I changed: end_weekday = 6
> 
> _reweight-by-utilization_ I notice, changed REWEIGHT for osd.0 and that did 
> something, I think.
> So now _active+remapped+backfill_toofull_ are gone from 'pcs' part of health 
> report.
> RAW USE & DATA are down, but stil:
> -> $ ceph osd df tree | egrep '(osd.0|ID)'
> ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP  META     AVAIL 
>    %USE   VAR   PGS  STATUS  TYPE NAME
>  0    ssd  0.09769   0.90002  150 GiB  134 GiB   82 GiB  1.1 MiB 2.0 GiB   16 
> GiB  89.51  1.31   68      up          osd.0
> and when compared to other host-ods which use "identical" disk-drives:

Do you really have a 10 GiB OSD drive? As noted, your CRUSH weights do not 
match the ostensible device sizes, which is a significant part of your problem. 
 What is underlying these OSDs? Why do you have OSDs as ostensibly small as 150 
GiB?

While there is a bit of work under way to better handle OSD devices whose 
capacity can change, today one has to ensure that CRUSH weights are adjusted to 
correlate with device size.


> -> $ ceph osd df tree | egrep '(osd\.[0,1,9] |ID)'
> ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP  META     AVAIL 
>    %USE   VAR   PGS  STATUS  TYPE NAME
>  9    ssd  0.04880   1.00000  150 GiB   52 GiB   51 GiB  624 KiB 1.4 GiB   98 
> GiB  34.83  0.51   43      up          osd.9
>  0    ssd  0.09769   0.90002  150 GiB  134 GiB   82 GiB  1.1 MiB 2.0 GiB   16 
> GiB  89.51  1.31   68      up          osd.0
>  1    ssd  0.04880   1.00000  150 GiB   53 GiB   53 GiB  526 KiB 254 MiB   97 
> GiB  35.40  0.52   44      up          osd.1
> 
> Perhaps cluster goes only as far as to satisfy _backfill_toofull_ be gone and 
> then "gives up"?
> The "other" disk-drives:
> -> $ ceph osd df tree | egrep '(osd\.(5|4|10)\  |ID)'
> ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP  META     AVAIL 
>    %USE   VAR   PGS  STATUS  TYPE NAME
> 10    ssd  0.29300   1.00000  400 GiB  307 GiB  305 GiB  2.0 MiB 2.1 GiB   93 
> GiB  76.68  1.12  246      up          osd.10
>  4    ssd  0.29300   1.00000  400 GiB  275 GiB  273 GiB  3.2 MiB 2.1 GiB  125 
> GiB  68.83  1.01  221      up          osd.4
>  5    ssd  0.29300   1.00000  400 GiB  305 GiB  303 GiB  3.0 MiB 2.5 GiB   95 
> GiB  76.27  1.12  245      up          osd.5
> 
> Seems that _host podster2_ balances its osds 4 & 0 "differently" to what 
> other two hosts do - if so then why?
> _______________________________________________
> ceph-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to