[ceph-users] Re: HELP! Cluster usage increased after adding new nodes/osd's

mhnx Mon, 07 Jul 2025 13:39:14 -0700

Hello Stefan!

All of my nodes and clients = Octopus 15.2.14


I have 1x RBD pool and 2000x rbd volumes with 100Gb / each


This is upmap balanced state, without manual reweight:

ID   CLASS  WEIGHT     REWEIGHT  SIZE     RAW USE   DATA      OMAP
META     AVAIL     %USE   VAR   PGS  STATUS  TYPE NAME
 -1         669.87897         -  671 TiB   381 TiB   376 TiB  170 GiB
5.2 TiB   289 TiB  56.87  1.00    -          root default
-53         335.36298         -  335 TiB   192 TiB   189 TiB   85 GiB
2.6 TiB   144 TiB  57.15  1.00    -              datacenter
E-datacenter

 **** OLD-NODE:
-43          20.95900         -   21 TiB    11 TiB    10 TiB  5.4 GiB
180 GiB    10 TiB  50.66  0.89    -                  host E10
240    ssd    1.74699   1.00000  1.7 TiB   728 GiB   714 GiB  425 MiB
 14 GiB   1.0 TiB  40.70  0.72  125      up              osd.240
241    ssd    1.74699   1.00000  1.7 TiB   924 GiB   909 GiB  507 MiB
 14 GiB   864 GiB  51.66  0.91  126      up              osd.241
242    ssd    1.74699   1.00000  1.7 TiB   913 GiB   898 GiB  513 MiB
 15 GiB   876 GiB  51.04  0.90  131      up              osd.242
243    ssd    1.74699   1.00000  1.7 TiB   896 GiB   880 GiB  474 MiB
 16 GiB   892 GiB  50.12  0.88  132      up              osd.243
244    ssd    1.74699   1.00000  1.7 TiB   842 GiB   826 GiB  411 MiB
 16 GiB   947 GiB  47.06  0.83  133      up              osd.244
245    ssd    1.74699   1.00000  1.7 TiB   912 GiB   896 GiB  416 MiB
 15 GiB   876 GiB  51.00  0.90  143      up              osd.245
246    ssd    1.74699   1.00000  1.7 TiB   940 GiB   925 GiB  535 MiB
 15 GiB   848 GiB  52.58  0.92  143      up              osd.246
247    ssd    1.74699   1.00000  1.7 TiB  1008 GiB   993 GiB  436 MiB
 15 GiB   781 GiB  56.35  0.99  135      up              osd.247
248    ssd    1.74699   1.00000  1.7 TiB   1.0 TiB   1.0 TiB  452 MiB
 15 GiB   728 GiB  59.28  1.04  141      up              osd.248
249    ssd    1.74699   1.00000  1.7 TiB   826 GiB   812 GiB  375 MiB
 14 GiB   962 GiB  46.21  0.81  128      up              osd.249
250    ssd    1.74699   1.00000  1.7 TiB   923 GiB   907 GiB  435 MiB
 15 GiB   866 GiB  51.60  0.91  136      up              osd.250
251    ssd    1.74699   1.00000  1.7 TiB   900 GiB   884 GiB  567 MiB
 15 GiB   889 GiB  50.30  0.88  142      up              osd.251

**** NEW-NODE:
-65          20.96375         -   21 TiB    16 TiB    16 TiB  5.4 GiB
125 GiB   5.1 TiB  75.47  1.33    -                  host E14
324    ssd    1.74698   1.00000  1.7 TiB   1.4 TiB   1.3 TiB  431 MiB
 10 GiB   399 GiB  77.72  1.37  124      up              osd.324
325    ssd    1.74698   1.00000  1.7 TiB   1.2 TiB   1.2 TiB  436 MiB
9.6 GiB   579 GiB  67.62  1.19  107      up              osd.325
326    ssd    1.74698   1.00000  1.7 TiB   1.3 TiB   1.3 TiB  446 MiB
 10 GiB   495 GiB  72.35  1.27  107      up              osd.326
327    ssd    1.74698   1.00000  1.7 TiB   1.4 TiB   1.4 TiB  506 MiB
 11 GiB   355 GiB  80.14  1.41  126      up              osd.327
328    ssd    1.74698   1.00000  1.7 TiB   1.3 TiB   1.3 TiB  432 MiB
 10 GiB   477 GiB  73.33  1.29  114      up              osd.328
329    ssd    1.74698   1.00000  1.7 TiB   1.4 TiB   1.4 TiB  530 MiB
 11 GiB   343 GiB  80.81  1.42  124      up              osd.329
330    ssd    1.74698   1.00000  1.7 TiB   1.2 TiB   1.2 TiB  432 MiB
 10 GiB   537 GiB  69.99  1.23  113      up              osd.330
331    ssd    1.74698   1.00000  1.7 TiB   1.4 TiB   1.4 TiB  473 MiB
 11 GiB   353 GiB  80.25  1.41  123      up              osd.331
332    ssd    1.74698   1.00000  1.7 TiB   1.4 TiB   1.4 TiB  459 MiB
 11 GiB   370 GiB  79.30  1.39  124      up              osd.332
333    ssd    1.74698   1.00000  1.7 TiB   1.3 TiB   1.2 TiB  438 MiB
 10 GiB   500 GiB  72.05  1.27  111      up              osd.333
334    ssd    1.74698   1.00000  1.7 TiB   1.4 TiB   1.4 TiB  433 MiB
 11 GiB   393 GiB  78.00  1.37  123      up              osd.334
335    ssd    1.74698   1.00000  1.7 TiB   1.3 TiB   1.3 TiB  488 MiB
 10 GiB   464 GiB  74.08  1.30  119      up              osd.335

---------------------
I can't upgrade to newer versions because I have a personal project
and it is designed for current linux and ceph version. Upgrade means a
lot of work for me.

Maybe the JJ balancer will do better job as you recommended but I
don't want better balance at this moment.

First of all I want to understand why this happened and what is
changed between "nautilus <-> octopus" and same OSD deploy method
generates near-full new OSD's with similar amount PG count.

-Best

Stefan Kooman <ste...@bit.nl>, 7 Tem 2025 Pzt, 22:22 tarihinde şunu yazdı:
>
> On 7/7/25 18:34, mhnx wrote:
> > Hello!
> >
> > Few years ago I build a "dc-a:12 + dc-b:12 = 24" node ceph cluster
> > with Nautilus v14.2.16
> > A year ago the cluster upgraded to Octopus and it was running fine.
> > Recently I added 4+4=8 new nodes with identical hardware and SSD drives.
> > When I created OSD's with Octopus, The cluster usage increased from %50 to 
> > %78!!
>
> What does a "ceph osd df tree" gives you?
>
> >
> > The weird problem is, the new OSD's become nearfull and hold more size
> > even if they have the same or less amount of PG's.
> >
> > I had to reweight new OSD's to 0.9 to make them equal size usage..
> > I increased the PG count 8192 to 16384 and ran balancer, it became
> > worse and I have %84 usage now!
>
> Remember that Ceph is limited by the fullest OSD in the cluster.
> Do you have old clients? If not, try to get rid of reweight and start
> using upmap. It is way more efficient in getting a cluster well
> balanced. I would recommend using this balance script:
> https://github.com/TheJJ/ceph-balancer
>
> Maybe first reset all the reweigths (first do: ceph osd set nobackfill).
> Then run this script:
> https://github.com/cernceph/ceph-scripts/blob/master/tools/upmap/upmap-remapped.py
>
> And after that run the ceph-balancer script. That should help
> tremendously if the cluster is imbalanced.
>
>
> >
> > I guess OSD or PG code changed between nautilus <-> octopus and it
> > generates this problem.
>
> What version of Octopus are you running?
>
> >
> > Can anyone help me with experience or knowledge about this?
> > What should I do?
> >
> > My solution idea:
> > I'm thinking of destroy and re-create old OSD's as a solution but I
> > need to re-create 144x3.8TB Sas SSD OSD's and it means 4-5 days of
> > maintenance.
> >
> > Also I have 2 osd per drive because it was recommended at Nautilus
> > times. How about this? Should I keep the config or should I use 1 osd
> > per 3.8TB SAS SSD ? What is the recommendation for Octopus and Quincy?
>
> I would recommend upgrading to newer, supported versions, maybe go to
> Pacific and then Reef. Modern versions of Ceph do not gain from
> deploying multiple OSDs per drive. What Ceph services are you running
> (MDS, RGW, RBD)?
>
> Gr. Stefan
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: HELP! Cluster usage increased after adding new nodes/osd's

Reply via email to