[ceph-users] Re: HELP! Cluster usage increased after adding new nodes/osd's

Szabo, Istvan (Agoda) Mon, 21 Jul 2025 21:18:07 -0700

Hi,

Totally same issue here also. Latest octopus and newly added osds with less pgs 
are more full than old ones.


What I normally do, let the cluster do the rebalance until in the new osds some 
of them hitting 75-80%, then let the cluster "rest" for 1-2 days. This time 
some background cleaning happens which can free up some spaces.
Once have space, start the rebalance again and stop at 75%, then let it rest.

It's terrible and slow solution but couldn't find it to make it differently.
If you are already in the reweight loop, use crush reweight to move away pg-s 
from that node and if at some stage you are able to finish the rebalance, 
slowly set back the crush weights.


________________________________
From: Joshua Baergen <jbaer...@digitalocean.com>
Sent: Tuesday, July 22, 2025 5:13 AM
To: mhnx <morphinwith...@gmail.com>
Cc: Ceph Users <ceph-users@ceph.io>
Subject: [ceph-users] Re: HELP! Cluster usage increased after adding new 
nodes/osd's

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !
________________________________

Hello,

Any chance that these OSDs were deployed with different
bluestore_min_alloc_size settings?

Josh

On Mon, Jul 7, 2025 at 2:39 PM mhnx <morphinwith...@gmail.com> wrote:
>
> Hello Stefan!
>
> All of my nodes and clients = Octopus 15.2.14
>
> I have 1x RBD pool and 2000x rbd volumes with 100Gb / each
>
>
> This is upmap balanced state, without manual reweight:
>
> ID   CLASS  WEIGHT     REWEIGHT  SIZE     RAW USE   DATA      OMAP
> META     AVAIL     %USE   VAR   PGS  STATUS  TYPE NAME
>  -1         669.87897         -  671 TiB   381 TiB   376 TiB  170 GiB
> 5.2 TiB   289 TiB  56.87  1.00    -          root default
> -53         335.36298         -  335 TiB   192 TiB   189 TiB   85 GiB
> 2.6 TiB   144 TiB  57.15  1.00    -              datacenter
> E-datacenter
>
>  **** OLD-NODE:
> -43          20.95900         -   21 TiB    11 TiB    10 TiB  5.4 GiB
> 180 GiB    10 TiB  50.66  0.89    -                  host E10
> 240    ssd    1.74699   1.00000  1.7 TiB   728 GiB   714 GiB  425 MiB
>  14 GiB   1.0 TiB  40.70  0.72  125      up              osd.240
> 241    ssd    1.74699   1.00000  1.7 TiB   924 GiB   909 GiB  507 MiB
>  14 GiB   864 GiB  51.66  0.91  126      up              osd.241
> 242    ssd    1.74699   1.00000  1.7 TiB   913 GiB   898 GiB  513 MiB
>  15 GiB   876 GiB  51.04  0.90  131      up              osd.242
> 243    ssd    1.74699   1.00000  1.7 TiB   896 GiB   880 GiB  474 MiB
>  16 GiB   892 GiB  50.12  0.88  132      up              osd.243
> 244    ssd    1.74699   1.00000  1.7 TiB   842 GiB   826 GiB  411 MiB
>  16 GiB   947 GiB  47.06  0.83  133      up              osd.244
> 245    ssd    1.74699   1.00000  1.7 TiB   912 GiB   896 GiB  416 MiB
>  15 GiB   876 GiB  51.00  0.90  143      up              osd.245
> 246    ssd    1.74699   1.00000  1.7 TiB   940 GiB   925 GiB  535 MiB
>  15 GiB   848 GiB  52.58  0.92  143      up              osd.246
> 247    ssd    1.74699   1.00000  1.7 TiB  1008 GiB   993 GiB  436 MiB
>  15 GiB   781 GiB  56.35  0.99  135      up              osd.247
> 248    ssd    1.74699   1.00000  1.7 TiB   1.0 TiB   1.0 TiB  452 MiB
>  15 GiB   728 GiB  59.28  1.04  141      up              osd.248
> 249    ssd    1.74699   1.00000  1.7 TiB   826 GiB   812 GiB  375 MiB
>  14 GiB   962 GiB  46.21  0.81  128      up              osd.249
> 250    ssd    1.74699   1.00000  1.7 TiB   923 GiB   907 GiB  435 MiB
>  15 GiB   866 GiB  51.60  0.91  136      up              osd.250
> 251    ssd    1.74699   1.00000  1.7 TiB   900 GiB   884 GiB  567 MiB
>  15 GiB   889 GiB  50.30  0.88  142      up              osd.251
>
> **** NEW-NODE:
> -65          20.96375         -   21 TiB    16 TiB    16 TiB  5.4 GiB
> 125 GiB   5.1 TiB  75.47  1.33    -                  host E14
> 324    ssd    1.74698   1.00000  1.7 TiB   1.4 TiB   1.3 TiB  431 MiB
>  10 GiB   399 GiB  77.72  1.37  124      up              osd.324
> 325    ssd    1.74698   1.00000  1.7 TiB   1.2 TiB   1.2 TiB  436 MiB
> 9.6 GiB   579 GiB  67.62  1.19  107      up              osd.325
> 326    ssd    1.74698   1.00000  1.7 TiB   1.3 TiB   1.3 TiB  446 MiB
>  10 GiB   495 GiB  72.35  1.27  107      up              osd.326
> 327    ssd    1.74698   1.00000  1.7 TiB   1.4 TiB   1.4 TiB  506 MiB
>  11 GiB   355 GiB  80.14  1.41  126      up              osd.327
> 328    ssd    1.74698   1.00000  1.7 TiB   1.3 TiB   1.3 TiB  432 MiB
>  10 GiB   477 GiB  73.33  1.29  114      up              osd.328
> 329    ssd    1.74698   1.00000  1.7 TiB   1.4 TiB   1.4 TiB  530 MiB
>  11 GiB   343 GiB  80.81  1.42  124      up              osd.329
> 330    ssd    1.74698   1.00000  1.7 TiB   1.2 TiB   1.2 TiB  432 MiB
>  10 GiB   537 GiB  69.99  1.23  113      up              osd.330
> 331    ssd    1.74698   1.00000  1.7 TiB   1.4 TiB   1.4 TiB  473 MiB
>  11 GiB   353 GiB  80.25  1.41  123      up              osd.331
> 332    ssd    1.74698   1.00000  1.7 TiB   1.4 TiB   1.4 TiB  459 MiB
>  11 GiB   370 GiB  79.30  1.39  124      up              osd.332
> 333    ssd    1.74698   1.00000  1.7 TiB   1.3 TiB   1.2 TiB  438 MiB
>  10 GiB   500 GiB  72.05  1.27  111      up              osd.333
> 334    ssd    1.74698   1.00000  1.7 TiB   1.4 TiB   1.4 TiB  433 MiB
>  11 GiB   393 GiB  78.00  1.37  123      up              osd.334
> 335    ssd    1.74698   1.00000  1.7 TiB   1.3 TiB   1.3 TiB  488 MiB
>  10 GiB   464 GiB  74.08  1.30  119      up              osd.335
>
> ---------------------
> I can't upgrade to newer versions because I have a personal project
> and it is designed for current linux and ceph version. Upgrade means a
> lot of work for me.
>
> Maybe the JJ balancer will do better job as you recommended but I
> don't want better balance at this moment.
>
> First of all I want to understand why this happened and what is
> changed between "nautilus <-> octopus" and same OSD deploy method
> generates near-full new OSD's with similar amount PG count.
>
> -Best
>
> Stefan Kooman <ste...@bit.nl>, 7 Tem 2025 Pzt, 22:22 tarihinde şunu yazdı:
> >
> > On 7/7/25 18:34, mhnx wrote:
> > > Hello!
> > >
> > > Few years ago I build a "dc-a:12 + dc-b:12 = 24" node ceph cluster
> > > with Nautilus v14.2.16
> > > A year ago the cluster upgraded to Octopus and it was running fine.
> > > Recently I added 4+4=8 new nodes with identical hardware and SSD drives.
> > > When I created OSD's with Octopus, The cluster usage increased from %50 
> > > to %78!!
> >
> > What does a "ceph osd df tree" gives you?
> >
> > >
> > > The weird problem is, the new OSD's become nearfull and hold more size
> > > even if they have the same or less amount of PG's.
> > >
> > > I had to reweight new OSD's to 0.9 to make them equal size usage..
> > > I increased the PG count 8192 to 16384 and ran balancer, it became
> > > worse and I have %84 usage now!
> >
> > Remember that Ceph is limited by the fullest OSD in the cluster.
> > Do you have old clients? If not, try to get rid of reweight and start
> > using upmap. It is way more efficient in getting a cluster well
> > balanced. I would recommend using this balance script:
> > https://github.com/TheJJ/ceph-balancer
> >
> > Maybe first reset all the reweigths (first do: ceph osd set nobackfill).
> > Then run this script:
> > https://github.com/cernceph/ceph-scripts/blob/master/tools/upmap/upmap-remapped.py
> >
> > And after that run the ceph-balancer script. That should help
> > tremendously if the cluster is imbalanced.
> >
> >
> > >
> > > I guess OSD or PG code changed between nautilus <-> octopus and it
> > > generates this problem.
> >
> > What version of Octopus are you running?
> >
> > >
> > > Can anyone help me with experience or knowledge about this?
> > > What should I do?
> > >
> > > My solution idea:
> > > I'm thinking of destroy and re-create old OSD's as a solution but I
> > > need to re-create 144x3.8TB Sas SSD OSD's and it means 4-5 days of
> > > maintenance.
> > >
> > > Also I have 2 osd per drive because it was recommended at Nautilus
> > > times. How about this? Should I keep the config or should I use 1 osd
> > > per 3.8TB SAS SSD ? What is the recommendation for Octopus and Quincy?
> >
> > I would recommend upgrading to newer, supported versions, maybe go to
> > Pacific and then Reef. Modern versions of Ceph do not gain from
> > deploying multiple OSDs per drive. What Ceph services are you running
> > (MDS, RGW, RBD)?
> >
> > Gr. Stefan
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

________________________________
This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: HELP! Cluster usage increased after adding new nodes/osd's

Reply via email to