> > Now 5 hours later - since i started this draft - I added: > -> $ ceph osd reweight-by-utilization
This is a legacy that is mostly obviated by the upmap balancer. It is best to not use this and leave all REWEIGHT values 1.00000 as it and the upmap balancer don’t play well together. > and before that cmd above I also noticed: > -> $ ceph config get mgr mgr/balancer/begin_weekday > 0 > -> $ ceph config get mgr mgr/balancer/end_weekday > 0 > which was done by 'deployment' process - cephadm bootstrap - and made me > wonder: > does that mean that auto-rebalance runs only on Sunday? By default it runs all the time. > I changed: end_weekday = 6 > > _reweight-by-utilization_ I notice, changed REWEIGHT for osd.0 and that did > something, I think. > So now _active+remapped+backfill_toofull_ are gone from 'pcs' part of health > report. > RAW USE & DATA are down, but stil: > -> $ ceph osd df tree | egrep '(osd.0|ID)' > ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL > %USE VAR PGS STATUS TYPE NAME > 0 ssd 0.09769 0.90002 150 GiB 134 GiB 82 GiB 1.1 MiB 2.0 GiB 16 > GiB 89.51 1.31 68 up osd.0 > and when compared to other host-ods which use "identical" disk-drives: Do you really have a 10 GiB OSD drive? As noted, your CRUSH weights do not match the ostensible device sizes, which is a significant part of your problem. What is underlying these OSDs? Why do you have OSDs as ostensibly small as 150 GiB? While there is a bit of work under way to better handle OSD devices whose capacity can change, today one has to ensure that CRUSH weights are adjusted to correlate with device size. > -> $ ceph osd df tree | egrep '(osd\.[0,1,9] |ID)' > ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL > %USE VAR PGS STATUS TYPE NAME > 9 ssd 0.04880 1.00000 150 GiB 52 GiB 51 GiB 624 KiB 1.4 GiB 98 > GiB 34.83 0.51 43 up osd.9 > 0 ssd 0.09769 0.90002 150 GiB 134 GiB 82 GiB 1.1 MiB 2.0 GiB 16 > GiB 89.51 1.31 68 up osd.0 > 1 ssd 0.04880 1.00000 150 GiB 53 GiB 53 GiB 526 KiB 254 MiB 97 > GiB 35.40 0.52 44 up osd.1 > > Perhaps cluster goes only as far as to satisfy _backfill_toofull_ be gone and > then "gives up"? > The "other" disk-drives: > -> $ ceph osd df tree | egrep '(osd\.(5|4|10)\ |ID)' > ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL > %USE VAR PGS STATUS TYPE NAME > 10 ssd 0.29300 1.00000 400 GiB 307 GiB 305 GiB 2.0 MiB 2.1 GiB 93 > GiB 76.68 1.12 246 up osd.10 > 4 ssd 0.29300 1.00000 400 GiB 275 GiB 273 GiB 3.2 MiB 2.1 GiB 125 > GiB 68.83 1.01 221 up osd.4 > 5 ssd 0.29300 1.00000 400 GiB 305 GiB 303 GiB 3.0 MiB 2.5 GiB 95 > GiB 76.27 1.12 245 up osd.5 > > Seems that _host podster2_ balances its osds 4 & 0 "differently" to what > other two hosts do - if so then why? > _______________________________________________ > ceph-users mailing list -- [email protected] > To unsubscribe send an email to [email protected] _______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
