Data goes to *all* PGs uniformly. Max_avail is limited by the available space on the most full OSD -- you should pay close attention to those and make sure they are moving in the right direction (decreasing!)
Another point -- IMHO you should aim to get all PGs active+clean before you add yet another batch of new disks. While there are PGs backfilling, your osdmaps are accumulating on the mons and osds -- this itself will start to use a lot of space, and active+clean is the only way to trim the old maps. -- dan On Tue, Mar 23, 2021 at 7:05 PM Boris Behrens <b...@kervyn.de> wrote: > > So, > doing nothing and wait for the ceph to recover? > > In theory there should be enough disk space (more disks arriving tomorrow), > but I fear that there might be an issue, when the backups get exported over > night to this s3. Currently the max_avail lingers around 13TB and I hope, > that the data will go to other PGs than the ones that are currently on filled > OSDs. > > > > Am Di., 23. März 2021 um 18:58 Uhr schrieb Dan van der Ster > <d...@vanderster.com>: >> >> Hi, >> >> backfill_toofull is not a bad thing when the cluster is really full >> like yours. You should expect some of the most full OSDs to eventually >> start decreasing in usage, as the PGs are moved to the new OSDs. Those >> backfill_toofull states should then resolve themselves as the OSD >> usage flattens out. >> Keep an eye on the usage of the backfill_full and nearfull OSDs though >> -- if they do eventually go above the full_ratio (95% by default), >> then writes to those OSDs would stop. >> >> But if on the other hand you're suffering from lots of slow ops or >> anything else visible to your users, then you could try to take some >> actions to slow down the rebalancing. Just let us know if that's the >> case and we can see about changing osd_max_backfills, some weights or >> maybe using the upmap-remapped tool. >> >> -- Dan >> >> On Tue, Mar 23, 2021 at 6:07 PM Boris Behrens <b...@kervyn.de> wrote: >> > >> > Ok, I should have listened to you :) >> > >> > In the last week we added more storage but the issue got worse instead. >> > Today I realized that the PGs were up to 90GB (bytes column in ceph pg ls >> > said 95705749636), and the balance kept mentioning the 2048 PGs for this >> > pool. We were at 72% utilization (ceph osd df tree, first line) for our >> > cluster and I increased the PGs to 2048. >> > >> > Now I am in a world of trouble. >> > The space in the cluster went down, I am at 45% misplaced objects, and we >> > already added 20x4TB disks just to not go down completly. >> > >> > The utilization is still going up and the overall free space in the >> > cluster seems to go down. This is what my ceph status looks like and now I >> > really need help to get that thing back to normal: >> > [root@s3db1 ~]# ceph status >> > cluster: >> > id: dca79fff-ffd0-58f4-1cff-82a2feea05f4 >> > health: HEALTH_WARN >> > 4 backfillfull osd(s) >> > 17 nearfull osd(s) >> > 37 pool(s) backfillfull >> > 13 large omap objects >> > Low space hindering backfill (add storage if this doesn't >> > resolve itself): 570 pgs backfill_toofull >> > >> > services: >> > mon: 3 daemons, quorum ceph-s3-mon1,ceph-s3-mon2,ceph-s3-mon3 (age 44m) >> > mgr: ceph-mgr2(active, since 15m), standbys: ceph-mgr3, ceph-mgr1 >> > mds: 3 up:standby >> > osd: 110 osds: 110 up (since 28m), 110 in (since 28m); 1535 remapped >> > pgs >> > rgw: 3 daemons active (eu-central-1, eu-msg-1, eu-secure-1) >> > >> > task status: >> > >> > data: >> > pools: 37 pools, 4032 pgs >> > objects: 116.23M objects, 182 TiB >> > usage: 589 TiB used, 206 TiB / 795 TiB avail >> > pgs: 160918554/348689415 objects misplaced (46.150%) >> > 2497 active+clean >> > 779 active+remapped+backfill_wait >> > 538 active+remapped+backfill_wait+backfill_toofull >> > 186 active+remapped+backfilling >> > 32 active+remapped+backfill_toofull >> > >> > io: >> > client: 27 MiB/s rd, 69 MiB/s wr, 497 op/s rd, 153 op/s wr >> > recovery: 1.5 GiB/s, 922 objects/s >> > >> > Am Di., 16. März 2021 um 09:34 Uhr schrieb Boris Behrens <b...@kervyn.de>: >> >> >> >> Hi Dan, >> >> >> >> my EC profile look very "default" to me. >> >> [root@s3db1 ~]# ceph osd erasure-code-profile ls >> >> default >> >> [root@s3db1 ~]# ceph osd erasure-code-profile get default >> >> k=2 >> >> m=1 >> >> plugin=jerasure >> >> technique=reed_sol_van >> >> >> >> I don't understand the ouput, but the balancing get worse over night: >> >> >> >> [root@s3db1 ~]# ceph-scripts/tools/ceph-pool-pg-distribution 11 >> >> Searching for PGs in pools: ['11'] >> >> Summary: 1024 PGs on 84 osds >> >> >> >> Num OSDs with X PGs: >> >> 15: 8 >> >> 16: 7 >> >> 17: 6 >> >> 18: 10 >> >> 19: 1 >> >> 32: 10 >> >> 33: 4 >> >> 34: 6 >> >> 35: 8 >> >> 65: 5 >> >> 66: 5 >> >> 67: 4 >> >> 68: 10 >> >> [root@s3db1 ~]# ceph-scripts/tools/ceph-pg-histogram --normalize --pool=11 >> >> # NumSamples = 84; Min = 4.12; Max = 5.09 >> >> # Mean = 4.553355; Variance = 0.052415; SD = 0.228942; Median 4.561608 >> >> # each ∎ represents a count of 1 >> >> 4.1244 - 4.2205 [ 8]: ∎∎∎∎∎∎∎∎ >> >> 4.2205 - 4.3166 [ 6]: ∎∎∎∎∎∎ >> >> 4.3166 - 4.4127 [ 11]: ∎∎∎∎∎∎∎∎∎∎∎ >> >> 4.4127 - 4.5087 [ 10]: ∎∎∎∎∎∎∎∎∎∎ >> >> 4.5087 - 4.6048 [ 11]: ∎∎∎∎∎∎∎∎∎∎∎ >> >> 4.6048 - 4.7009 [ 19]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ >> >> 4.7009 - 4.7970 [ 6]: ∎∎∎∎∎∎ >> >> 4.7970 - 4.8931 [ 8]: ∎∎∎∎∎∎∎∎ >> >> 4.8931 - 4.9892 [ 4]: ∎∎∎∎ >> >> 4.9892 - 5.0852 [ 1]: ∎ >> >> [root@s3db1 ~]# ceph osd df tree | sort -nk 17 | tail >> >> 14 hdd 3.63689 1.00000 3.6 TiB 2.9 TiB 724 GiB 19 GiB 0 B 724 >> >> GiB 80.56 1.07 56 up osd.14 >> >> 19 hdd 3.68750 1.00000 3.7 TiB 3.0 TiB 2.9 TiB 466 MiB 7.9 GiB 708 >> >> GiB 81.25 1.08 53 up osd.19 >> >> 4 hdd 3.63689 1.00000 3.6 TiB 3.0 TiB 698 GiB 703 MiB 0 B 698 >> >> GiB 81.27 1.08 48 up osd.4 >> >> 24 hdd 3.63689 1.00000 3.6 TiB 3.0 TiB 695 GiB 640 MiB 0 B 695 >> >> GiB 81.34 1.08 46 up osd.24 >> >> 75 hdd 3.68750 1.00000 3.7 TiB 3.0 TiB 2.9 TiB 440 MiB 8.1 GiB 704 >> >> GiB 81.35 1.08 48 up osd.75 >> >> 71 hdd 3.68750 1.00000 3.7 TiB 3.0 TiB 3.0 TiB 7.5 MiB 8.0 GiB 663 >> >> GiB 82.44 1.09 47 up osd.71 >> >> 76 hdd 3.68750 1.00000 3.7 TiB 3.1 TiB 3.0 TiB 251 MiB 9.0 GiB 617 >> >> GiB 83.65 1.11 50 up osd.76 >> >> 33 hdd 3.73630 1.00000 3.7 TiB 3.1 TiB 3.0 TiB 399 MiB 8.1 GiB 618 >> >> GiB 83.85 1.11 55 up osd.33 >> >> 35 hdd 3.73630 1.00000 3.7 TiB 3.1 TiB 3.0 TiB 317 MiB 8.8 GiB 617 >> >> GiB 83.87 1.11 50 up osd.35 >> >> 34 hdd 3.73630 1.00000 3.7 TiB 3.2 TiB 3.1 TiB 451 MiB 8.7 GiB 545 >> >> GiB 85.75 1.14 54 up osd.34 >> >> >> >> Am Mo., 15. März 2021 um 17:23 Uhr schrieb Dan van der Ster >> >> <d...@vanderster.com>: >> >>> >> >>> Hi, >> >>> >> >>> How wide are your EC profiles? If they are really wide, you might be >> >>> reaching the limits of what is physically possible. Also, I'm not sure >> >>> that upmap in 14.2.11 is very smart about *improving* existing upmap >> >>> rules for a given PG, in the case that a PG already has an upmap-items >> >>> entry but it would help the distribution to add more mapping pairs to >> >>> that entry. What this means, is that it might sometimes be useful to >> >>> randomly remove some upmap entries and see if the balancer does a >> >>> better job when it replaces them. >> >>> >> >>> But before you do that, I re-remembered that looking at the total PG >> >>> numbers is not useful -- you need to check the PGs per OSD for the >> >>> eu-central-1.rgw.buckets.data pool only. >> >>> >> >>> We have a couple tools that can help with this: >> >>> >> >>> 1. To see the PGs per OSD for a given pool: >> >>> >> >>> https://github.com/cernceph/ceph-scripts/blob/master/tools/ceph-pool-pg-distribution >> >>> >> >>> E.g.: ./ceph-pool-pg-distribution 11 # to see the distribution of >> >>> your eu-central-1.rgw.buckets.data pool. >> >>> >> >>> The output looks like this on my well balanced clusters: >> >>> >> >>> # ceph-scripts/tools/ceph-pool-pg-distribution 15 >> >>> Searching for PGs in pools: ['15'] >> >>> Summary: 256 pgs on 56 osds >> >>> >> >>> Num OSDs with X PGs: >> >>> 13: 16 >> >>> 14: 40 >> >>> >> >>> You should expect a trimodal for your cluster. >> >>> >> >>> 2. You can also use another script from that repo to see the PGs per >> >>> OSD normalized to crush weight: >> >>> ceph-scripts/tools/ceph-pg-histogram --normalize --pool=15 >> >>> >> >>> This might explain what is going wrong. >> >>> >> >>> Cheers, Dan >> >>> >> >>> >> >>> On Mon, Mar 15, 2021 at 3:04 PM Boris Behrens <b...@kervyn.de> wrote: >> >>> > >> >>> > Absolutly: >> >>> > [root@s3db1 ~]# ceph osd df tree >> >>> > ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META >> >>> > AVAIL %USE VAR PGS STATUS TYPE NAME >> >>> > -1 673.54224 - 674 TiB 496 TiB 468 TiB 97 GiB 1.2 TiB >> >>> > 177 TiB 73.67 1.00 - root default >> >>> > -2 58.30331 - 58 TiB 42 TiB 38 TiB 9.2 GiB 99 GiB >> >>> > 16 TiB 72.88 0.99 - host s3db1 >> >>> > 23 hdd 14.65039 1.00000 15 TiB 11 TiB 11 TiB 714 MiB 25 GiB >> >>> > 3.7 TiB 74.87 1.02 194 up osd.23 >> >>> > 69 hdd 14.55269 1.00000 15 TiB 11 TiB 11 TiB 1.6 GiB 40 GiB >> >>> > 3.4 TiB 76.32 1.04 199 up osd.69 >> >>> > 73 hdd 14.55269 1.00000 15 TiB 11 TiB 11 TiB 1.3 GiB 34 GiB >> >>> > 3.8 TiB 74.15 1.01 203 up osd.73 >> >>> > 79 hdd 3.63689 1.00000 3.6 TiB 2.4 TiB 1.3 TiB 1.8 GiB 0 B >> >>> > 1.3 TiB 65.44 0.89 47 up osd.79 >> >>> > 80 hdd 3.63689 1.00000 3.6 TiB 2.4 TiB 1.3 TiB 2.2 GiB 0 B >> >>> > 1.3 TiB 65.34 0.89 48 up osd.80 >> >>> > 81 hdd 3.63689 1.00000 3.6 TiB 2.4 TiB 1.3 TiB 1.1 GiB 0 B >> >>> > 1.3 TiB 65.38 0.89 47 up osd.81 >> >>> > 82 hdd 3.63689 1.00000 3.6 TiB 2.5 TiB 1.1 TiB 619 MiB 0 B >> >>> > 1.1 TiB 68.46 0.93 41 up osd.82 >> >>> > -11 50.94173 - 51 TiB 37 TiB 37 TiB 3.5 GiB 98 GiB >> >>> > 14 TiB 71.90 0.98 - host s3db10 >> >>> > 63 hdd 7.27739 1.00000 7.3 TiB 5.3 TiB 5.3 TiB 647 MiB 14 GiB >> >>> > 2.0 TiB 72.72 0.99 94 up osd.63 >> >>> > 64 hdd 7.27739 1.00000 7.3 TiB 5.3 TiB 5.2 TiB 668 MiB 14 GiB >> >>> > 2.0 TiB 72.23 0.98 93 up osd.64 >> >>> > 65 hdd 7.27739 1.00000 7.3 TiB 5.2 TiB 5.2 TiB 227 MiB 14 GiB >> >>> > 2.1 TiB 71.16 0.97 100 up osd.65 >> >>> > 66 hdd 7.27739 1.00000 7.3 TiB 5.4 TiB 5.4 TiB 313 MiB 14 GiB >> >>> > 1.9 TiB 74.25 1.01 92 up osd.66 >> >>> > 67 hdd 7.27739 1.00000 7.3 TiB 5.1 TiB 5.1 TiB 584 MiB 14 GiB >> >>> > 2.1 TiB 70.63 0.96 96 up osd.67 >> >>> > 68 hdd 7.27739 1.00000 7.3 TiB 5.2 TiB 5.2 TiB 720 MiB 14 GiB >> >>> > 2.1 TiB 71.72 0.97 101 up osd.68 >> >>> > 70 hdd 7.27739 1.00000 7.3 TiB 5.1 TiB 5.1 TiB 425 MiB 14 GiB >> >>> > 2.1 TiB 70.59 0.96 97 up osd.70 >> >>> > -12 50.99052 - 51 TiB 38 TiB 37 TiB 2.1 GiB 97 GiB >> >>> > 13 TiB 73.77 1.00 - host s3db11 >> >>> > 46 hdd 7.27739 1.00000 7.3 TiB 5.6 TiB 5.6 TiB 229 MiB 14 GiB >> >>> > 1.7 TiB 77.05 1.05 97 up osd.46 >> >>> > 47 hdd 7.27739 1.00000 7.3 TiB 5.1 TiB 5.1 TiB 159 MiB 13 GiB >> >>> > 2.2 TiB 70.00 0.95 89 up osd.47 >> >>> > 48 hdd 7.27739 1.00000 7.3 TiB 5.2 TiB 5.2 TiB 279 MiB 14 GiB >> >>> > 2.1 TiB 71.82 0.97 98 up osd.48 >> >>> > 49 hdd 7.27739 1.00000 7.3 TiB 5.5 TiB 5.4 TiB 276 MiB 14 GiB >> >>> > 1.8 TiB 74.90 1.02 95 up osd.49 >> >>> > 50 hdd 7.27739 1.00000 7.3 TiB 5.2 TiB 5.2 TiB 336 MiB 14 GiB >> >>> > 2.0 TiB 72.13 0.98 93 up osd.50 >> >>> > 51 hdd 7.27739 1.00000 7.3 TiB 5.7 TiB 5.6 TiB 728 MiB 15 GiB >> >>> > 1.6 TiB 77.76 1.06 98 up osd.51 >> >>> > 72 hdd 7.32619 1.00000 7.3 TiB 5.3 TiB 5.3 TiB 147 MiB 13 GiB >> >>> > 2.0 TiB 72.75 0.99 95 up osd.72 >> >>> > -37 58.55478 - 59 TiB 44 TiB 44 TiB 4.4 GiB 122 GiB >> >>> > 15 TiB 75.20 1.02 - host s3db12 >> >>> > 19 hdd 3.68750 1.00000 3.7 TiB 2.9 TiB 2.9 TiB 454 MiB 8.2 GiB >> >>> > 780 GiB 79.35 1.08 53 up osd.19 >> >>> > 71 hdd 3.68750 1.00000 3.7 TiB 3.0 TiB 2.9 TiB 7.1 MiB 8.0 GiB >> >>> > 734 GiB 80.56 1.09 47 up osd.71 >> >>> > 75 hdd 3.68750 1.00000 3.7 TiB 2.9 TiB 2.9 TiB 439 MiB 8.2 GiB >> >>> > 777 GiB 79.43 1.08 48 up osd.75 >> >>> > 76 hdd 3.68750 1.00000 3.7 TiB 3.0 TiB 3.0 TiB 241 MiB 8.9 GiB >> >>> > 688 GiB 81.77 1.11 50 up osd.76 >> >>> > 77 hdd 14.60159 1.00000 15 TiB 11 TiB 11 TiB 880 MiB 30 GiB >> >>> > 3.6 TiB 75.44 1.02 201 up osd.77 >> >>> > 78 hdd 14.60159 1.00000 15 TiB 10 TiB 10 TiB 1015 MiB 28 GiB >> >>> > 4.2 TiB 71.26 0.97 193 up osd.78 >> >>> > 83 hdd 14.60159 1.00000 15 TiB 11 TiB 11 TiB 1.4 GiB 30 GiB >> >>> > 3.8 TiB 73.76 1.00 203 up osd.83 >> >>> > -3 58.49872 - 58 TiB 42 TiB 36 TiB 8.2 GiB 89 GiB >> >>> > 17 TiB 71.71 0.97 - host s3db2 >> >>> > 1 hdd 14.65039 1.00000 15 TiB 11 TiB 11 TiB 3.2 GiB 37 GiB >> >>> > 3.7 TiB 74.58 1.01 196 up osd.1 >> >>> > 3 hdd 3.63689 1.00000 3.6 TiB 2.3 TiB 1.3 TiB 566 MiB 0 B >> >>> > 1.3 TiB 64.11 0.87 50 up osd.3 >> >>> > 4 hdd 3.63689 1.00000 3.6 TiB 2.9 TiB 771 GiB 695 MiB 0 B >> >>> > 771 GiB 79.30 1.08 48 up osd.4 >> >>> > 5 hdd 3.63689 1.00000 3.6 TiB 2.4 TiB 1.2 TiB 482 MiB 0 B >> >>> > 1.2 TiB 66.51 0.90 49 up osd.5 >> >>> > 6 hdd 3.63689 1.00000 3.6 TiB 2.3 TiB 1.3 TiB 1.8 GiB 0 B >> >>> > 1.3 TiB 64.00 0.87 42 up osd.6 >> >>> > 7 hdd 14.65039 1.00000 15 TiB 11 TiB 11 TiB 639 MiB 26 GiB >> >>> > 4.0 TiB 72.44 0.98 192 up osd.7 >> >>> > 74 hdd 14.65039 1.00000 15 TiB 10 TiB 10 TiB 907 MiB 26 GiB >> >>> > 4.2 TiB 71.32 0.97 193 up osd.74 >> >>> > -4 58.49872 - 58 TiB 43 TiB 36 TiB 34 GiB 85 GiB >> >>> > 16 TiB 72.69 0.99 - host s3db3 >> >>> > 2 hdd 14.65039 1.00000 15 TiB 11 TiB 11 TiB 980 MiB 26 GiB >> >>> > 3.8 TiB 74.36 1.01 203 up osd.2 >> >>> > 9 hdd 14.65039 1.00000 15 TiB 11 TiB 11 TiB 8.4 GiB 33 GiB >> >>> > 3.9 TiB 73.51 1.00 186 up osd.9 >> >>> > 10 hdd 14.65039 1.00000 15 TiB 10 TiB 10 TiB 650 MiB 26 GiB >> >>> > 4.2 TiB 71.64 0.97 201 up osd.10 >> >>> > 12 hdd 3.63689 1.00000 3.6 TiB 2.3 TiB 1.3 TiB 754 MiB 0 B >> >>> > 1.3 TiB 64.17 0.87 44 up osd.12 >> >>> > 13 hdd 3.63689 1.00000 3.6 TiB 2.8 TiB 813 GiB 2.4 GiB 0 B >> >>> > 813 GiB 78.17 1.06 58 up osd.13 >> >>> > 14 hdd 3.63689 1.00000 3.6 TiB 2.9 TiB 797 GiB 19 GiB 0 B >> >>> > 797 GiB 78.60 1.07 56 up osd.14 >> >>> > 15 hdd 3.63689 1.00000 3.6 TiB 2.3 TiB 1.3 TiB 2.2 GiB 0 B >> >>> > 1.3 TiB 63.96 0.87 41 up osd.15 >> >>> > -5 58.49872 - 58 TiB 43 TiB 36 TiB 6.7 GiB 97 GiB >> >>> > 15 TiB 74.04 1.01 - host s3db4 >> >>> > 11 hdd 14.65039 1.00000 15 TiB 11 TiB 11 TiB 940 MiB 26 GiB >> >>> > 4.0 TiB 72.49 0.98 196 up osd.11 >> >>> > 17 hdd 14.65039 1.00000 15 TiB 11 TiB 11 TiB 1022 MiB 26 GiB >> >>> > 3.6 TiB 75.23 1.02 204 up osd.17 >> >>> > 18 hdd 14.65039 1.00000 15 TiB 11 TiB 11 TiB 945 MiB 45 GiB >> >>> > 3.8 TiB 74.16 1.01 193 up osd.18 >> >>> > 20 hdd 3.63689 1.00000 3.6 TiB 2.6 TiB 1020 GiB 596 MiB 0 B >> >>> > 1020 GiB 72.62 0.99 57 up osd.20 >> >>> > 21 hdd 3.63689 1.00000 3.6 TiB 2.6 TiB 1023 GiB 1.9 GiB 0 B >> >>> > 1023 GiB 72.54 0.98 41 up osd.21 >> >>> > 22 hdd 3.63689 1.00000 3.6 TiB 2.6 TiB 1023 GiB 797 MiB 0 B >> >>> > 1023 GiB 72.54 0.98 53 up osd.22 >> >>> > 24 hdd 3.63689 1.00000 3.6 TiB 2.9 TiB 766 GiB 618 MiB 0 B >> >>> > 766 GiB 79.42 1.08 46 up osd.24 >> >>> > -6 58.89636 - 59 TiB 43 TiB 43 TiB 3.0 GiB 108 GiB >> >>> > 16 TiB 73.40 1.00 - host s3db5 >> >>> > 0 hdd 3.73630 1.00000 3.7 TiB 2.7 TiB 2.6 TiB 92 MiB 7.2 GiB >> >>> > 1.1 TiB 71.16 0.97 45 up osd.0 >> >>> > 25 hdd 3.73630 1.00000 3.7 TiB 2.7 TiB 2.6 TiB 2.4 MiB 7.3 GiB >> >>> > 1.1 TiB 71.23 0.97 41 up osd.25 >> >>> > 26 hdd 3.73630 1.00000 3.7 TiB 2.8 TiB 2.7 TiB 181 MiB 7.6 GiB >> >>> > 935 GiB 75.57 1.03 45 up osd.26 >> >>> > 27 hdd 3.73630 1.00000 3.7 TiB 2.7 TiB 2.6 TiB 5.1 MiB 7.0 GiB >> >>> > 1.1 TiB 71.20 0.97 47 up osd.27 >> >>> > 28 hdd 14.65039 1.00000 15 TiB 11 TiB 11 TiB 977 MiB 26 GiB >> >>> > 3.8 TiB 73.85 1.00 197 up osd.28 >> >>> > 29 hdd 14.65039 1.00000 15 TiB 11 TiB 10 TiB 872 MiB 26 GiB >> >>> > 4.1 TiB 71.98 0.98 196 up osd.29 >> >>> > 30 hdd 14.65039 1.00000 15 TiB 11 TiB 11 TiB 943 MiB 27 GiB >> >>> > 3.6 TiB 75.51 1.03 202 up osd.30 >> >>> > -7 58.89636 - 59 TiB 44 TiB 43 TiB 13 GiB 122 GiB >> >>> > 15 TiB 74.97 1.02 - host s3db6 >> >>> > 32 hdd 3.73630 1.00000 3.7 TiB 2.8 TiB 2.7 TiB 27 MiB 7.6 GiB >> >>> > 940 GiB 75.42 1.02 55 up osd.32 >> >>> > 33 hdd 3.73630 1.00000 3.7 TiB 3.1 TiB 3.0 TiB 376 MiB 8.2 GiB >> >>> > 691 GiB 81.94 1.11 55 up osd.33 >> >>> > 34 hdd 3.73630 1.00000 3.7 TiB 3.1 TiB 3.0 TiB 450 MiB 8.5 GiB >> >>> > 620 GiB 83.79 1.14 54 up osd.34 >> >>> > 35 hdd 3.73630 1.00000 3.7 TiB 3.1 TiB 3.0 TiB 316 MiB 8.4 GiB >> >>> > 690 GiB 81.98 1.11 50 up osd.35 >> >>> > 36 hdd 14.65039 1.00000 15 TiB 11 TiB 10 TiB 489 MiB 25 GiB >> >>> > 4.1 TiB 71.69 0.97 208 up osd.36 >> >>> > 37 hdd 14.65039 1.00000 15 TiB 11 TiB 11 TiB 11 GiB 38 GiB >> >>> > 4.0 TiB 72.41 0.98 195 up osd.37 >> >>> > 38 hdd 14.65039 1.00000 15 TiB 11 TiB 11 TiB 1.1 GiB 26 GiB >> >>> > 3.7 TiB 74.88 1.02 204 up osd.38 >> >>> > -8 58.89636 - 59 TiB 44 TiB 43 TiB 3.8 GiB 111 GiB >> >>> > 15 TiB 74.16 1.01 - host s3db7 >> >>> > 39 hdd 3.73630 1.00000 3.7 TiB 2.8 TiB 2.7 TiB 19 MiB 7.5 GiB >> >>> > 936 GiB 75.54 1.03 39 up osd.39 >> >>> > 40 hdd 3.73630 1.00000 3.7 TiB 2.6 TiB 2.5 TiB 144 MiB 7.1 GiB >> >>> > 1.1 TiB 69.87 0.95 39 up osd.40 >> >>> > 41 hdd 3.73630 1.00000 3.7 TiB 2.7 TiB 2.7 TiB 219 MiB 7.6 GiB >> >>> > 1011 GiB 73.57 1.00 55 up osd.41 >> >>> > 42 hdd 3.73630 1.00000 3.7 TiB 2.6 TiB 2.5 TiB 593 MiB 7.1 GiB >> >>> > 1.1 TiB 70.02 0.95 47 up osd.42 >> >>> > 43 hdd 14.65039 1.00000 15 TiB 11 TiB 11 TiB 500 MiB 27 GiB >> >>> > 3.7 TiB 74.67 1.01 204 up osd.43 >> >>> > 44 hdd 14.65039 1.00000 15 TiB 11 TiB 11 TiB 1.1 GiB 27 GiB >> >>> > 3.7 TiB 74.62 1.01 193 up osd.44 >> >>> > 45 hdd 14.65039 1.00000 15 TiB 11 TiB 11 TiB 1.2 GiB 29 GiB >> >>> > 3.6 TiB 75.16 1.02 204 up osd.45 >> >>> > -9 51.28331 - 51 TiB 39 TiB 39 TiB 4.9 GiB 107 GiB >> >>> > 12 TiB 76.50 1.04 - host s3db8 >> >>> > 8 hdd 7.32619 1.00000 7.3 TiB 5.6 TiB 5.5 TiB 474 MiB 14 GiB >> >>> > 1.7 TiB 76.37 1.04 98 up osd.8 >> >>> > 16 hdd 7.32619 1.00000 7.3 TiB 5.7 TiB 5.7 TiB 783 MiB 15 GiB >> >>> > 1.6 TiB 78.39 1.06 100 up osd.16 >> >>> > 31 hdd 7.32619 1.00000 7.3 TiB 5.7 TiB 5.6 TiB 441 MiB 14 GiB >> >>> > 1.6 TiB 77.70 1.05 91 up osd.31 >> >>> > 52 hdd 7.32619 1.00000 7.3 TiB 5.6 TiB 5.5 TiB 939 MiB 14 GiB >> >>> > 1.7 TiB 76.29 1.04 102 up osd.52 >> >>> > 53 hdd 7.32619 1.00000 7.3 TiB 5.4 TiB 5.4 TiB 848 MiB 18 GiB >> >>> > 1.9 TiB 74.30 1.01 98 up osd.53 >> >>> > 54 hdd 7.32619 1.00000 7.3 TiB 5.6 TiB 5.6 TiB 1.0 GiB 16 GiB >> >>> > 1.7 TiB 76.99 1.05 106 up osd.54 >> >>> > 55 hdd 7.32619 1.00000 7.3 TiB 5.5 TiB 5.5 TiB 460 MiB 15 GiB >> >>> > 1.8 TiB 75.46 1.02 105 up osd.55 >> >>> > -10 51.28331 - 51 TiB 37 TiB 37 TiB 3.8 GiB 96 GiB >> >>> > 14 TiB 72.77 0.99 - host s3db9 >> >>> > 56 hdd 7.32619 1.00000 7.3 TiB 5.2 TiB 5.2 TiB 846 MiB 13 GiB >> >>> > 2.1 TiB 71.16 0.97 104 up osd.56 >> >>> > 57 hdd 7.32619 1.00000 7.3 TiB 5.6 TiB 5.6 TiB 513 MiB 15 GiB >> >>> > 1.7 TiB 76.53 1.04 96 up osd.57 >> >>> > 58 hdd 7.32619 1.00000 7.3 TiB 5.2 TiB 5.2 TiB 604 MiB 13 GiB >> >>> > 2.1 TiB 71.23 0.97 98 up osd.58 >> >>> > 59 hdd 7.32619 1.00000 7.3 TiB 5.1 TiB 5.1 TiB 414 MiB 13 GiB >> >>> > 2.2 TiB 70.03 0.95 88 up osd.59 >> >>> > 60 hdd 7.32619 1.00000 7.3 TiB 5.5 TiB 5.5 TiB 227 MiB 14 GiB >> >>> > 1.8 TiB 75.54 1.03 97 up osd.60 >> >>> > 61 hdd 7.32619 1.00000 7.3 TiB 5.1 TiB 5.1 TiB 456 MiB 13 GiB >> >>> > 2.2 TiB 70.01 0.95 95 up osd.61 >> >>> > 62 hdd 7.32619 1.00000 7.3 TiB 5.5 TiB 5.4 TiB 843 MiB 14 GiB >> >>> > 1.8 TiB 74.93 1.02 110 up osd.62 >> >>> > TOTAL 674 TiB 496 TiB 468 TiB 97 GiB 1.2 TiB >> >>> > 177 TiB 73.67 >> >>> > MIN/MAX VAR: 0.87/1.14 STDDEV: 4.22 >> >>> > >> >>> > Am Mo., 15. März 2021 um 15:02 Uhr schrieb Dan van der Ster >> >>> > <d...@vanderster.com>: >> >>> >> >> >>> >> OK thanks. Indeed "prepared 0/10 changes" means it thinks things are >> >>> >> balanced. >> >>> >> Could you again share the full ceph osd df tree? >> >>> >> >> >>> >> On Mon, Mar 15, 2021 at 2:54 PM Boris Behrens <b...@kervyn.de> wrote: >> >>> >> > >> >>> >> > Hi Dan, >> >>> >> > >> >>> >> > I've set the autoscaler to warn, but it actually does not warn for >> >>> >> > now. So not touching it for now. >> >>> >> > >> >>> >> > this is what the log says in minute intervals: >> >>> >> > 2021-03-15 13:51:00.970 7f307d5fd700 4 mgr get_config get_config >> >>> >> > key: mgr/balancer/active >> >>> >> > 2021-03-15 13:51:00.970 7f307d5fd700 4 mgr get_config get_config >> >>> >> > key: mgr/balancer/sleep_interval >> >>> >> > 2021-03-15 13:51:00.970 7f307d5fd700 4 mgr get_config get_config >> >>> >> > key: mgr/balancer/begin_time >> >>> >> > 2021-03-15 13:51:00.970 7f307d5fd700 4 mgr get_config get_config >> >>> >> > key: mgr/balancer/end_time >> >>> >> > 2021-03-15 13:51:00.970 7f307d5fd700 4 mgr get_config get_config >> >>> >> > key: mgr/balancer/begin_weekday >> >>> >> > 2021-03-15 13:51:00.970 7f307d5fd700 4 mgr get_config get_config >> >>> >> > key: mgr/balancer/end_weekday >> >>> >> > 2021-03-15 13:51:00.971 7f307d5fd700 4 mgr get_config get_config >> >>> >> > key: mgr/balancer/pool_ids >> >>> >> > 2021-03-15 13:51:01.203 7f307d5fd700 4 mgr[balancer] Optimize plan >> >>> >> > auto_2021-03-15_13:51:00 >> >>> >> > 2021-03-15 13:51:01.203 7f307d5fd700 4 mgr get_config get_config >> >>> >> > key: mgr/balancer/mode >> >>> >> > 2021-03-15 13:51:01.203 7f307d5fd700 4 mgr[balancer] Mode upmap, >> >>> >> > max misplaced 0.050000 >> >>> >> > 2021-03-15 13:51:01.203 7f307d5fd700 4 mgr[balancer] do_upmap >> >>> >> > 2021-03-15 13:51:01.203 7f307d5fd700 4 mgr get_config get_config >> >>> >> > key: mgr/balancer/upmap_max_iterations >> >>> >> > 2021-03-15 13:51:01.203 7f307d5fd700 4 mgr get_config get_config >> >>> >> > key: mgr/balancer/upmap_max_deviation >> >>> >> > 2021-03-15 13:51:01.203 7f307d5fd700 4 mgr[balancer] pools >> >>> >> > ['eu-msg-1.rgw.data.root', 'eu-msg-1.rgw.buckets.non-ec', >> >>> >> > 'eu-central-1.rgw.users.keys', 'eu-central-1.rgw.gc', >> >>> >> > 'eu-central-1.rgw.buckets.data', 'eu-central-1.rgw.users.email', >> >>> >> > 'eu-msg-1.rgw.gc', 'eu-central-1.rgw.usage', >> >>> >> > 'eu-msg-1.rgw.users.keys', 'eu-central-1.rgw.buckets.index', 'rbd', >> >>> >> > 'eu-msg-1.rgw.log', 'whitespace-again-2021-03-10_2', >> >>> >> > 'eu-msg-1.rgw.buckets.index', 'eu-msg-1.rgw.meta', >> >>> >> > 'eu-central-1.rgw.log', 'default.rgw.gc', >> >>> >> > 'eu-central-1.rgw.buckets.non-ec', 'eu-msg-1.rgw.usage', >> >>> >> > 'whitespace-again-2021-03-10', 'fra-1.rgw.meta', >> >>> >> > 'eu-central-1.rgw.users.uid', 'eu-msg-1.rgw.users.email', >> >>> >> > 'fra-1.rgw.control', 'eu-msg-1.rgw.users.uid', >> >>> >> > 'eu-msg-1.rgw.control', '.rgw.root', 'eu-msg-1.rgw.buckets.data', >> >>> >> > 'default.rgw.control', 'fra-1.rgw.log', 'default.rgw.data.root', >> >>> >> > 'whitespace-again-2021-03-10_3', 'default.rgw.log', >> >>> >> > 'eu-central-1.rgw.meta', 'eu-central-1.rgw.data.root', >> >>> >> > 'default.rgw.users.uid', 'eu-central-1.rgw.control'] >> >>> >> > 2021-03-15 13:51:01.224 7f307d5fd700 4 mgr[balancer] prepared 0/10 >> >>> >> > changes >> >>> >> > >> >>> >> > Am Mo., 15. März 2021 um 14:15 Uhr schrieb Dan van der Ster >> >>> >> > <d...@vanderster.com>: >> >>> >> >> >> >>> >> >> I suggest to just disable the autoscaler until your balancing is >> >>> >> >> understood. >> >>> >> >> >> >>> >> >> What does your active mgr log say (with debug_mgr 4/5), grep >> >>> >> >> balancer >> >>> >> >> /var/log/ceph/ceph-mgr.*.log >> >>> >> >> >> >>> >> >> -- Dan >> >>> >> >> >> >>> >> >> On Mon, Mar 15, 2021 at 1:47 PM Boris Behrens <b...@kervyn.de> >> >>> >> >> wrote: >> >>> >> >> > >> >>> >> >> > Hi, >> >>> >> >> > this unfortunally did not solve my problem. I still have some >> >>> >> >> > OSDs that fill up to 85% >> >>> >> >> > >> >>> >> >> > According to the logging, the autoscaler might want to add more >> >>> >> >> > PGs to one Bucken and reduce almost all other buckets to 32. >> >>> >> >> > 2021-03-15 12:19:58.825 7f307f601700 4 mgr[pg_autoscaler] Pool >> >>> >> >> > 'eu-central-1.rgw.buckets.data' root_id -1 using 0.705080476146 >> >>> >> >> > of space, bias 1.0, pg target 1974.22533321 quantized to 2048 >> >>> >> >> > (current 1024) >> >>> >> >> > >> >>> >> >> > Why the balancing does not happen is still nebulous to me. >> >>> >> >> > >> >>> >> >> > >> >>> >> >> > >> >>> >> >> > Am Sa., 13. März 2021 um 16:37 Uhr schrieb Dan van der Ster >> >>> >> >> > <d...@vanderster.com>: >> >>> >> >> >> >> >>> >> >> >> OK >> >>> >> >> >> Btw, you might need to fail to a new mgr... I'm not sure if the >> >>> >> >> >> current active will read that new config. >> >>> >> >> >> >> >>> >> >> >> .. dan >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> On Sat, Mar 13, 2021, 4:36 PM Boris Behrens <b...@kervyn.de> >> >>> >> >> >> wrote: >> >>> >> >> >>> >> >>> >> >> >>> Hi, >> >>> >> >> >>> >> >>> >> >> >>> ok thanks. I just changed the value and rewighted everything >> >>> >> >> >>> back to 1. Now I let it sync the weekend and check how it will >> >>> >> >> >>> be on monday. >> >>> >> >> >>> We tried to have the systems total storage balanced as >> >>> >> >> >>> possible. New systems will be with 8TB disks but for the >> >>> >> >> >>> exiting ones we added 16TB to offset the 4TB disks and we >> >>> >> >> >>> needed a lot of storage fast, because of a DC move. If you >> >>> >> >> >>> have any recommendations I would be happy to hear them. >> >>> >> >> >>> >> >>> >> >> >>> Cheers >> >>> >> >> >>> Boris >> >>> >> >> >>> >> >>> >> >> >>> Am Sa., 13. März 2021 um 16:20 Uhr schrieb Dan van der Ster >> >>> >> >> >>> <d...@vanderster.com>: >> >>> >> >> >>>> >> >>> >> >> >>>> Thanks. >> >>> >> >> >>>> >> >>> >> >> >>>> Decreasing the max deviation to 2 or 1 should help in your >> >>> >> >> >>>> case. This option controls when the balancer stops trying to >> >>> >> >> >>>> move PGs around -- by default it stops when the deviation >> >>> >> >> >>>> from the mean is 5. Yes this is too large IMO -- all of our >> >>> >> >> >>>> clusters have this set to 1. >> >>> >> >> >>>> >> >>> >> >> >>>> And given that you have some OSDs with more than 200 PGs, you >> >>> >> >> >>>> definitely shouldn't increase the num PGs. >> >>> >> >> >>>> >> >>> >> >> >>>> But anyway with your mixed device sizes it might be >> >>> >> >> >>>> challenging to make a perfectly uniform distribution. Give it >> >>> >> >> >>>> a try with 1 though, and let us know how it goes. >> >>> >> >> >>>> >> >>> >> >> >>>> .. Dan >> >>> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> On Sat, Mar 13, 2021, 4:11 PM Boris Behrens <b...@kervyn.de> >> >>> >> >> >>>> wrote: >> >>> >> >> >>>>> >> >>> >> >> >>>>> Hi Dan, >> >>> >> >> >>>>> >> >>> >> >> >>>>> upmap_max_deviation is default (5) in our cluster. Is 1 the >> >>> >> >> >>>>> recommended deviation? >> >>> >> >> >>>>> >> >>> >> >> >>>>> I added the whole ceph osd df tree, (I need to remove some >> >>> >> >> >>>>> OSDs and readd them as bluestore with SSD, so 69, 73 and 82 >> >>> >> >> >>>>> are a bit off now. I also reweighted to try to get the %USE >> >>> >> >> >>>>> mitigated). >> >>> >> >> >>>>> >> >>> >> >> >>>>> I will increase the mgr debugging to see what is the problem. >> >>> >> >> >>>>> >> >>> >> >> >>>>> [root@s3db1 ~]# ceph osd df tree >> >>> >> >> >>>>> ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP >> >>> >> >> >>>>> META AVAIL %USE VAR PGS STATUS TYPE NAME >> >>> >> >> >>>>> -1 673.54224 - 659 TiB 491 TiB 464 TiB 96 GiB >> >>> >> >> >>>>> 1.2 TiB 168 TiB 74.57 1.00 - root default >> >>> >> >> >>>>> -2 58.30331 - 44 TiB 22 TiB 17 TiB 5.7 GiB >> >>> >> >> >>>>> 38 GiB 22 TiB 49.82 0.67 - host s3db1 >> >>> >> >> >>>>> 23 hdd 14.65039 1.00000 15 TiB 1.8 TiB 1.7 TiB 156 MiB >> >>> >> >> >>>>> 4.4 GiB 13 TiB 12.50 0.17 101 up osd.23 >> >>> >> >> >>>>> 69 hdd 14.55269 0 0 B 0 B 0 B 0 B >> >>> >> >> >>>>> 0 B 0 B 0 0 11 up osd.69 >> >>> >> >> >>>>> 73 hdd 14.55269 1.00000 15 TiB 10 TiB 10 TiB 6.1 MiB >> >>> >> >> >>>>> 33 GiB 4.2 TiB 71.15 0.95 107 up osd.73 >> >>> >> >> >>>>> 79 hdd 3.63689 1.00000 3.6 TiB 2.9 TiB 747 GiB 2.0 GiB >> >>> >> >> >>>>> 0 B 747 GiB 79.94 1.07 52 up osd.79 >> >>> >> >> >>>>> 80 hdd 3.63689 1.00000 3.6 TiB 2.6 TiB 1.0 TiB 1.9 GiB >> >>> >> >> >>>>> 0 B 1.0 TiB 71.61 0.96 58 up osd.80 >> >>> >> >> >>>>> 81 hdd 3.63689 1.00000 3.6 TiB 2.2 TiB 1.5 TiB 1.1 GiB >> >>> >> >> >>>>> 0 B 1.5 TiB 60.07 0.81 55 up osd.81 >> >>> >> >> >>>>> 82 hdd 3.63689 1.00000 3.6 TiB 1.9 TiB 1.7 TiB 536 MiB >> >>> >> >> >>>>> 0 B 1.7 TiB 52.68 0.71 30 up osd.82 >> >>> >> >> >>>>> -11 50.94173 - 51 TiB 38 TiB 38 TiB 3.7 GiB >> >>> >> >> >>>>> 100 GiB 13 TiB 74.69 1.00 - host s3db10 >> >>> >> >> >>>>> 63 hdd 7.27739 1.00000 7.3 TiB 5.5 TiB 5.5 TiB 616 MiB >> >>> >> >> >>>>> 14 GiB 1.7 TiB 76.04 1.02 92 up osd.63 >> >>> >> >> >>>>> 64 hdd 7.27739 1.00000 7.3 TiB 5.5 TiB 5.5 TiB 820 MiB >> >>> >> >> >>>>> 15 GiB 1.8 TiB 75.54 1.01 101 up osd.64 >> >>> >> >> >>>>> 65 hdd 7.27739 1.00000 7.3 TiB 5.3 TiB 5.3 TiB 109 MiB >> >>> >> >> >>>>> 14 GiB 2.0 TiB 73.17 0.98 105 up osd.65 >> >>> >> >> >>>>> 66 hdd 7.27739 1.00000 7.3 TiB 5.8 TiB 5.8 TiB 423 MiB >> >>> >> >> >>>>> 15 GiB 1.4 TiB 80.38 1.08 98 up osd.66 >> >>> >> >> >>>>> 67 hdd 7.27739 1.00000 7.3 TiB 5.1 TiB 5.1 TiB 572 MiB >> >>> >> >> >>>>> 14 GiB 2.2 TiB 70.10 0.94 100 up osd.67 >> >>> >> >> >>>>> 68 hdd 7.27739 1.00000 7.3 TiB 5.3 TiB 5.3 TiB 630 MiB >> >>> >> >> >>>>> 13 GiB 2.0 TiB 72.88 0.98 107 up osd.68 >> >>> >> >> >>>>> 70 hdd 7.27739 1.00000 7.3 TiB 5.4 TiB 5.4 TiB 648 MiB >> >>> >> >> >>>>> 14 GiB 1.8 TiB 74.73 1.00 102 up osd.70 >> >>> >> >> >>>>> -12 50.99052 - 51 TiB 39 TiB 39 TiB 2.9 GiB >> >>> >> >> >>>>> 99 GiB 12 TiB 77.24 1.04 - host s3db11 >> >>> >> >> >>>>> 46 hdd 7.27739 1.00000 7.3 TiB 5.7 TiB 5.7 TiB 102 MiB >> >>> >> >> >>>>> 15 GiB 1.5 TiB 78.91 1.06 97 up osd.46 >> >>> >> >> >>>>> 47 hdd 7.27739 1.00000 7.3 TiB 5.2 TiB 5.2 TiB 61 MiB >> >>> >> >> >>>>> 13 GiB 2.1 TiB 71.47 0.96 96 up osd.47 >> >>> >> >> >>>>> 48 hdd 7.27739 1.00000 7.3 TiB 6.1 TiB 6.1 TiB 853 MiB >> >>> >> >> >>>>> 15 GiB 1.2 TiB 83.46 1.12 109 up osd.48 >> >>> >> >> >>>>> 49 hdd 7.27739 1.00000 7.3 TiB 5.7 TiB 5.7 TiB 708 MiB >> >>> >> >> >>>>> 15 GiB 1.5 TiB 78.96 1.06 98 up osd.49 >> >>> >> >> >>>>> 50 hdd 7.27739 1.00000 7.3 TiB 5.9 TiB 5.8 TiB 472 MiB >> >>> >> >> >>>>> 15 GiB 1.4 TiB 80.40 1.08 102 up osd.50 >> >>> >> >> >>>>> 51 hdd 7.27739 1.00000 7.3 TiB 5.9 TiB 5.9 TiB 729 MiB >> >>> >> >> >>>>> 15 GiB 1.3 TiB 81.70 1.10 110 up osd.51 >> >>> >> >> >>>>> 72 hdd 7.32619 1.00000 7.3 TiB 4.8 TiB 4.8 TiB 91 MiB >> >>> >> >> >>>>> 12 GiB 2.5 TiB 65.82 0.88 89 up osd.72 >> >>> >> >> >>>>> -37 58.55478 - 59 TiB 46 TiB 46 TiB 5.0 GiB >> >>> >> >> >>>>> 124 GiB 12 TiB 79.04 1.06 - host s3db12 >> >>> >> >> >>>>> 19 hdd 3.68750 1.00000 3.7 TiB 3.1 TiB 3.1 TiB 462 MiB >> >>> >> >> >>>>> 8.2 GiB 559 GiB 85.18 1.14 55 up osd.19 >> >>> >> >> >>>>> 71 hdd 3.68750 1.00000 3.7 TiB 2.9 TiB 2.8 TiB 3.9 MiB >> >>> >> >> >>>>> 7.8 GiB 825 GiB 78.14 1.05 50 up osd.71 >> >>> >> >> >>>>> 75 hdd 3.68750 1.00000 3.7 TiB 3.1 TiB 3.1 TiB 576 MiB >> >>> >> >> >>>>> 8.3 GiB 555 GiB 85.29 1.14 57 up osd.75 >> >>> >> >> >>>>> 76 hdd 3.68750 1.00000 3.7 TiB 3.2 TiB 3.1 TiB 239 MiB >> >>> >> >> >>>>> 9.3 GiB 501 GiB 86.73 1.16 50 up osd.76 >> >>> >> >> >>>>> 77 hdd 14.60159 1.00000 15 TiB 11 TiB 11 TiB 880 MiB >> >>> >> >> >>>>> 30 GiB 3.6 TiB 75.57 1.01 202 up osd.77 >> >>> >> >> >>>>> 78 hdd 14.60159 1.00000 15 TiB 11 TiB 11 TiB 1.0 GiB >> >>> >> >> >>>>> 30 GiB 3.4 TiB 76.65 1.03 196 up osd.78 >> >>> >> >> >>>>> 83 hdd 14.60159 1.00000 15 TiB 12 TiB 12 TiB 1.8 GiB >> >>> >> >> >>>>> 31 GiB 2.9 TiB 80.04 1.07 223 up osd.83 >> >>> >> >> >>>>> -3 58.49872 - 58 TiB 43 TiB 38 TiB 8.1 GiB >> >>> >> >> >>>>> 91 GiB 16 TiB 73.15 0.98 - host s3db2 >> >>> >> >> >>>>> 1 hdd 14.65039 1.00000 15 TiB 11 TiB 11 TiB 3.1 GiB >> >>> >> >> >>>>> 38 GiB 3.6 TiB 75.52 1.01 194 up osd.1 >> >>> >> >> >>>>> 3 hdd 3.63689 1.00000 3.6 TiB 2.2 TiB 1.4 TiB 418 MiB >> >>> >> >> >>>>> 0 B 1.4 TiB 60.94 0.82 52 up osd.3 >> >>> >> >> >>>>> 4 hdd 3.63689 0.89999 3.6 TiB 3.2 TiB 401 GiB 845 MiB >> >>> >> >> >>>>> 0 B 401 GiB 89.23 1.20 53 up osd.4 >> >>> >> >> >>>>> 5 hdd 3.63689 1.00000 3.6 TiB 2.3 TiB 1.3 TiB 437 MiB >> >>> >> >> >>>>> 0 B 1.3 TiB 62.88 0.84 51 up osd.5 >> >>> >> >> >>>>> 6 hdd 3.63689 1.00000 3.6 TiB 2.0 TiB 1.7 TiB 1.8 GiB >> >>> >> >> >>>>> 0 B 1.7 TiB 54.51 0.73 47 up osd.6 >> >>> >> >> >>>>> 7 hdd 14.65039 1.00000 15 TiB 11 TiB 11 TiB 493 MiB >> >>> >> >> >>>>> 26 GiB 3.8 TiB 73.90 0.99 185 up osd.7 >> >>> >> >> >>>>> 74 hdd 14.65039 1.00000 15 TiB 11 TiB 11 TiB 1.1 GiB >> >>> >> >> >>>>> 27 GiB 3.5 TiB 76.27 1.02 208 up osd.74 >> >>> >> >> >>>>> -4 58.49872 - 58 TiB 43 TiB 37 TiB 33 GiB >> >>> >> >> >>>>> 86 GiB 15 TiB 74.05 0.99 - host s3db3 >> >>> >> >> >>>>> 2 hdd 14.65039 1.00000 15 TiB 11 TiB 11 TiB 850 MiB >> >>> >> >> >>>>> 26 GiB 4.0 TiB 72.78 0.98 203 up osd.2 >> >>> >> >> >>>>> 9 hdd 14.65039 1.00000 15 TiB 11 TiB 11 TiB 8.3 GiB >> >>> >> >> >>>>> 33 GiB 3.6 TiB 75.62 1.01 189 up osd.9 >> >>> >> >> >>>>> 10 hdd 14.65039 1.00000 15 TiB 11 TiB 11 TiB 663 MiB >> >>> >> >> >>>>> 28 GiB 3.5 TiB 76.34 1.02 211 up osd.10 >> >>> >> >> >>>>> 12 hdd 3.63689 1.00000 3.6 TiB 2.4 TiB 1.2 TiB 633 MiB >> >>> >> >> >>>>> 0 B 1.2 TiB 66.22 0.89 44 up osd.12 >> >>> >> >> >>>>> 13 hdd 3.63689 1.00000 3.6 TiB 2.9 TiB 720 GiB 2.3 GiB >> >>> >> >> >>>>> 0 B 720 GiB 80.66 1.08 66 up osd.13 >> >>> >> >> >>>>> 14 hdd 3.63689 1.00000 3.6 TiB 3.1 TiB 552 GiB 18 GiB >> >>> >> >> >>>>> 0 B 552 GiB 85.18 1.14 60 up osd.14 >> >>> >> >> >>>>> 15 hdd 3.63689 1.00000 3.6 TiB 2.0 TiB 1.7 TiB 2.1 GiB >> >>> >> >> >>>>> 0 B 1.7 TiB 53.72 0.72 44 up osd.15 >> >>> >> >> >>>>> -5 58.49872 - 58 TiB 45 TiB 37 TiB 7.2 GiB >> >>> >> >> >>>>> 99 GiB 14 TiB 76.37 1.02 - host s3db4 >> >>> >> >> >>>>> 11 hdd 14.65039 1.00000 15 TiB 12 TiB 12 TiB 897 MiB >> >>> >> >> >>>>> 28 GiB 2.8 TiB 81.15 1.09 205 up osd.11 >> >>> >> >> >>>>> 17 hdd 14.65039 1.00000 15 TiB 11 TiB 11 TiB 1.2 GiB >> >>> >> >> >>>>> 27 GiB 3.6 TiB 75.38 1.01 211 up osd.17 >> >>> >> >> >>>>> 18 hdd 14.65039 1.00000 15 TiB 11 TiB 11 TiB 965 MiB >> >>> >> >> >>>>> 44 GiB 4.0 TiB 72.86 0.98 188 up osd.18 >> >>> >> >> >>>>> 20 hdd 3.63689 1.00000 3.6 TiB 2.9 TiB 796 GiB 529 MiB >> >>> >> >> >>>>> 0 B 796 GiB 78.63 1.05 66 up osd.20 >> >>> >> >> >>>>> 21 hdd 3.63689 1.00000 3.6 TiB 2.6 TiB 1.1 TiB 2.1 GiB >> >>> >> >> >>>>> 0 B 1.1 TiB 70.32 0.94 47 up osd.21 >> >>> >> >> >>>>> 22 hdd 3.63689 1.00000 3.6 TiB 2.9 TiB 802 GiB 882 MiB >> >>> >> >> >>>>> 0 B 802 GiB 78.47 1.05 58 up osd.22 >> >>> >> >> >>>>> 24 hdd 3.63689 1.00000 3.6 TiB 2.8 TiB 856 GiB 645 MiB >> >>> >> >> >>>>> 0 B 856 GiB 77.01 1.03 47 up osd.24 >> >>> >> >> >>>>> -6 58.89636 - 59 TiB 44 TiB 44 TiB 2.4 GiB >> >>> >> >> >>>>> 111 GiB 15 TiB 75.22 1.01 - host s3db5 >> >>> >> >> >>>>> 0 hdd 3.73630 1.00000 3.7 TiB 2.4 TiB 2.3 TiB 70 MiB >> >>> >> >> >>>>> 6.6 GiB 1.3 TiB 65.00 0.87 48 up osd.0 >> >>> >> >> >>>>> 25 hdd 3.73630 1.00000 3.7 TiB 2.4 TiB 2.3 TiB 5.3 MiB >> >>> >> >> >>>>> 6.6 GiB 1.4 TiB 63.86 0.86 41 up osd.25 >> >>> >> >> >>>>> 26 hdd 3.73630 1.00000 3.7 TiB 2.9 TiB 2.8 TiB 181 MiB >> >>> >> >> >>>>> 7.6 GiB 862 GiB 77.47 1.04 48 up osd.26 >> >>> >> >> >>>>> 27 hdd 3.73630 1.00000 3.7 TiB 2.3 TiB 2.2 TiB 7.0 MiB >> >>> >> >> >>>>> 6.1 GiB 1.5 TiB 61.00 0.82 48 up osd.27 >> >>> >> >> >>>>> 28 hdd 14.65039 1.00000 15 TiB 12 TiB 12 TiB 937 MiB >> >>> >> >> >>>>> 30 GiB 2.8 TiB 81.19 1.09 203 up osd.28 >> >>> >> >> >>>>> 29 hdd 14.65039 1.00000 15 TiB 11 TiB 11 TiB 536 MiB >> >>> >> >> >>>>> 26 GiB 3.8 TiB 73.95 0.99 200 up osd.29 >> >>> >> >> >>>>> 30 hdd 14.65039 1.00000 15 TiB 12 TiB 11 TiB 744 MiB >> >>> >> >> >>>>> 28 GiB 3.1 TiB 79.07 1.06 207 up osd.30 >> >>> >> >> >>>>> -7 58.89636 - 59 TiB 44 TiB 44 TiB 14 GiB >> >>> >> >> >>>>> 122 GiB 14 TiB 75.41 1.01 - host s3db6 >> >>> >> >> >>>>> 32 hdd 3.73630 1.00000 3.7 TiB 3.1 TiB 3.0 TiB 16 MiB >> >>> >> >> >>>>> 8.2 GiB 622 GiB 83.74 1.12 65 up osd.32 >> >>> >> >> >>>>> 33 hdd 3.73630 0.79999 3.7 TiB 3.0 TiB 2.9 TiB 14 MiB >> >>> >> >> >>>>> 8.1 GiB 740 GiB 80.67 1.08 52 up osd.33 >> >>> >> >> >>>>> 34 hdd 3.73630 0.79999 3.7 TiB 2.9 TiB 2.8 TiB 449 MiB >> >>> >> >> >>>>> 7.7 GiB 877 GiB 77.08 1.03 52 up osd.34 >> >>> >> >> >>>>> 35 hdd 3.73630 0.79999 3.7 TiB 2.3 TiB 2.2 TiB 133 MiB >> >>> >> >> >>>>> 7.0 GiB 1.4 TiB 62.18 0.83 42 up osd.35 >> >>> >> >> >>>>> 36 hdd 14.65039 1.00000 15 TiB 11 TiB 11 TiB 544 MiB >> >>> >> >> >>>>> 26 GiB 4.0 TiB 72.98 0.98 220 up osd.36 >> >>> >> >> >>>>> 37 hdd 14.65039 1.00000 15 TiB 11 TiB 11 TiB 11 GiB >> >>> >> >> >>>>> 38 GiB 3.6 TiB 75.30 1.01 200 up osd.37 >> >>> >> >> >>>>> 38 hdd 14.65039 1.00000 15 TiB 11 TiB 11 TiB 1.2 GiB >> >>> >> >> >>>>> 28 GiB 3.3 TiB 77.43 1.04 217 up osd.38 >> >>> >> >> >>>>> -8 58.89636 - 59 TiB 47 TiB 46 TiB 3.9 GiB >> >>> >> >> >>>>> 116 GiB 12 TiB 78.98 1.06 - host s3db7 >> >>> >> >> >>>>> 39 hdd 3.73630 1.00000 3.7 TiB 3.2 TiB 3.2 TiB 19 MiB >> >>> >> >> >>>>> 8.5 GiB 499 GiB 86.96 1.17 43 up osd.39 >> >>> >> >> >>>>> 40 hdd 3.73630 1.00000 3.7 TiB 2.6 TiB 2.5 TiB 144 MiB >> >>> >> >> >>>>> 7.0 GiB 1.2 TiB 68.33 0.92 39 up osd.40 >> >>> >> >> >>>>> 41 hdd 3.73630 1.00000 3.7 TiB 3.0 TiB 2.9 TiB 218 MiB >> >>> >> >> >>>>> 7.9 GiB 732 GiB 80.86 1.08 64 up osd.41 >> >>> >> >> >>>>> 42 hdd 3.73630 1.00000 3.7 TiB 2.5 TiB 2.4 TiB 594 MiB >> >>> >> >> >>>>> 7.0 GiB 1.2 TiB 67.97 0.91 50 up osd.42 >> >>> >> >> >>>>> 43 hdd 14.65039 1.00000 15 TiB 12 TiB 12 TiB 564 MiB >> >>> >> >> >>>>> 28 GiB 2.9 TiB 80.32 1.08 213 up osd.43 >> >>> >> >> >>>>> 44 hdd 14.65039 1.00000 15 TiB 12 TiB 11 TiB 1.3 GiB >> >>> >> >> >>>>> 28 GiB 3.1 TiB 78.59 1.05 198 up osd.44 >> >>> >> >> >>>>> 45 hdd 14.65039 1.00000 15 TiB 12 TiB 12 TiB 1.2 GiB >> >>> >> >> >>>>> 30 GiB 2.8 TiB 81.05 1.09 214 up osd.45 >> >>> >> >> >>>>> -9 51.28331 - 51 TiB 41 TiB 41 TiB 4.9 GiB >> >>> >> >> >>>>> 108 GiB 10 TiB 79.75 1.07 - host s3db8 >> >>> >> >> >>>>> 8 hdd 7.32619 1.00000 7.3 TiB 5.8 TiB 5.8 TiB 472 MiB >> >>> >> >> >>>>> 15 GiB 1.5 TiB 79.68 1.07 99 up osd.8 >> >>> >> >> >>>>> 16 hdd 7.32619 1.00000 7.3 TiB 5.9 TiB 5.8 TiB 785 MiB >> >>> >> >> >>>>> 15 GiB 1.4 TiB 80.25 1.08 97 up osd.16 >> >>> >> >> >>>>> 31 hdd 7.32619 1.00000 7.3 TiB 5.5 TiB 5.5 TiB 438 MiB >> >>> >> >> >>>>> 14 GiB 1.8 TiB 75.36 1.01 87 up osd.31 >> >>> >> >> >>>>> 52 hdd 7.32619 1.00000 7.3 TiB 5.7 TiB 5.7 TiB 844 MiB >> >>> >> >> >>>>> 15 GiB 1.6 TiB 78.19 1.05 113 up osd.52 >> >>> >> >> >>>>> 53 hdd 7.32619 1.00000 7.3 TiB 6.2 TiB 6.1 TiB 792 MiB >> >>> >> >> >>>>> 18 GiB 1.1 TiB 84.46 1.13 109 up osd.53 >> >>> >> >> >>>>> 54 hdd 7.32619 1.00000 7.3 TiB 5.6 TiB 5.6 TiB 959 MiB >> >>> >> >> >>>>> 15 GiB 1.7 TiB 76.73 1.03 115 up osd.54 >> >>> >> >> >>>>> 55 hdd 7.32619 1.00000 7.3 TiB 6.1 TiB 6.1 TiB 699 MiB >> >>> >> >> >>>>> 16 GiB 1.2 TiB 83.56 1.12 122 up osd.55 >> >>> >> >> >>>>> -10 51.28331 - 51 TiB 39 TiB 39 TiB 4.7 GiB >> >>> >> >> >>>>> 100 GiB 12 TiB 76.05 1.02 - host s3db9 >> >>> >> >> >>>>> 56 hdd 7.32619 1.00000 7.3 TiB 5.2 TiB 5.2 TiB 840 MiB >> >>> >> >> >>>>> 13 GiB 2.1 TiB 71.06 0.95 105 up osd.56 >> >>> >> >> >>>>> 57 hdd 7.32619 1.00000 7.3 TiB 6.1 TiB 6.0 TiB 1.0 GiB >> >>> >> >> >>>>> 16 GiB 1.2 TiB 83.17 1.12 102 up osd.57 >> >>> >> >> >>>>> 58 hdd 7.32619 1.00000 7.3 TiB 6.0 TiB 5.9 TiB 43 MiB >> >>> >> >> >>>>> 15 GiB 1.4 TiB 81.56 1.09 105 up osd.58 >> >>> >> >> >>>>> 59 hdd 7.32619 1.00000 7.3 TiB 5.9 TiB 5.9 TiB 429 MiB >> >>> >> >> >>>>> 15 GiB 1.4 TiB 80.64 1.08 94 up osd.59 >> >>> >> >> >>>>> 60 hdd 7.32619 1.00000 7.3 TiB 5.4 TiB 5.3 TiB 226 MiB >> >>> >> >> >>>>> 14 GiB 2.0 TiB 73.25 0.98 101 up osd.60 >> >>> >> >> >>>>> 61 hdd 7.32619 1.00000 7.3 TiB 4.8 TiB 4.8 TiB 1.1 GiB >> >>> >> >> >>>>> 12 GiB 2.5 TiB 65.84 0.88 103 up osd.61 >> >>> >> >> >>>>> 62 hdd 7.32619 1.00000 7.3 TiB 5.6 TiB 5.6 TiB 1.0 GiB >> >>> >> >> >>>>> 15 GiB 1.7 TiB 76.83 1.03 126 up osd.62 >> >>> >> >> >>>>> TOTAL 674 TiB 501 TiB 473 TiB 96 GiB >> >>> >> >> >>>>> 1.2 TiB 173 TiB 74.57 >> >>> >> >> >>>>> MIN/MAX VAR: 0.17/1.20 STDDEV: 10.25 >> >>> >> >> >>>>> >> >>> >> >> >>>>> >> >>> >> >> >>>>> >> >>> >> >> >>>>> Am Sa., 13. März 2021 um 15:57 Uhr schrieb Dan van der Ster >> >>> >> >> >>>>> <d...@vanderster.com>: >> >>> >> >> >>>>>> >> >>> >> >> >>>>>> No, increasing num PGs won't help substantially. >> >>> >> >> >>>>>> >> >>> >> >> >>>>>> Can you share the entire output of ceph osd df tree ? >> >>> >> >> >>>>>> >> >>> >> >> >>>>>> Did you already set >> >>> >> >> >>>>>> >> >>> >> >> >>>>>> ceph config set mgr mgr/balancer/upmap_max_deviation 1 >> >>> >> >> >>>>>> >> >>> >> >> >>>>>> >> >>> >> >> >>>>>> ?? >> >>> >> >> >>>>>> And I recommend debug_mgr 4/5 so you can see some basic >> >>> >> >> >>>>>> upmap balancer logging. >> >>> >> >> >>>>>> >> >>> >> >> >>>>>> .. Dan >> >>> >> >> >>>>>> >> >>> >> >> >>>>>> >> >>> >> >> >>>>>> >> >>> >> >> >>>>>> >> >>> >> >> >>>>>> >> >>> >> >> >>>>>> >> >>> >> >> >>>>>> On Sat, Mar 13, 2021, 3:49 PM Boris Behrens >> >>> >> >> >>>>>> <b...@kervyn.de> wrote: >> >>> >> >> >>>>>>> >> >>> >> >> >>>>>>> Hello people, >> >>> >> >> >>>>>>> >> >>> >> >> >>>>>>> I am still struggeling with the balancer >> >>> >> >> >>>>>>> (https://www.mail-archive.com/ceph-users@ceph.io/msg09124.html) >> >>> >> >> >>>>>>> Now I've read some more and might think that I do not have >> >>> >> >> >>>>>>> enough PGs. >> >>> >> >> >>>>>>> Currently I have 84OSDs and 1024PGs for the main pool >> >>> >> >> >>>>>>> (3008 total). I >> >>> >> >> >>>>>>> have the autoscaler enabled, but I doesn't tell me to >> >>> >> >> >>>>>>> increase the >> >>> >> >> >>>>>>> PGs. >> >>> >> >> >>>>>>> >> >>> >> >> >>>>>>> What do you think? >> >>> >> >> >>>>>>> >> >>> >> >> >>>>>>> -- >> >>> >> >> >>>>>>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal >> >>> >> >> >>>>>>> abweichend >> >>> >> >> >>>>>>> im groüen Saal. >> >>> >> >> >>>>>>> _______________________________________________ >> >>> >> >> >>>>>>> ceph-users mailing list -- ceph-users@ceph.io >> >>> >> >> >>>>>>> To unsubscribe send an email to ceph-users-le...@ceph.io >> >>> >> >> >>>>> >> >>> >> >> >>>>> >> >>> >> >> >>>>> >> >>> >> >> >>>>> -- >> >>> >> >> >>>>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal >> >>> >> >> >>>>> abweichend im groüen Saal. >> >>> >> >> >>> >> >>> >> >> >>> >> >>> >> >> >>> >> >>> >> >> >>> -- >> >>> >> >> >>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal >> >>> >> >> >>> abweichend im groüen Saal. >> >>> >> >> > >> >>> >> >> > >> >>> >> >> > >> >>> >> >> > -- >> >>> >> >> > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal >> >>> >> >> > abweichend im groüen Saal. >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> > -- >> >>> >> > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal >> >>> >> > abweichend im groüen Saal. >> >>> > >> >>> > >> >>> > >> >>> > -- >> >>> > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend >> >>> > im groüen Saal. >> >> >> >> >> >> >> >> -- >> >> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im >> >> groüen Saal. >> > >> > >> > >> > -- >> > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im >> > groüen Saal. > > > > -- > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im > groüen Saal. _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io