[ceph-users] Re: Suspiciously low PG count for CephFS with many small files

Frédéric Nass Fri, 20 Jun 2025 22:41:07 -0700

Hi Niklas,

Most admins who haven't taken the time to tame this somewhat wild animal, like 
myself, will tell you to disable it and choose for yourself the amount of PGs 
that you think is right to allocate to each pool to get as many OSDs as 
possible to participate by distributing the budget of 150-200 PGs/ HDD OSD 
among all the pools in the best way possible.


But you could try setting the autoscaler to "warn" mode and modifying the 
target_size_ratio and/or setting the bulk flag on the "large" pools (those 
containing the most data or the most objects) to see what it will do and then 
decide whether to apply its recommendations if they seem relevant to you.

These two blog posts [1] [2] should shed some light on the subject.

Regards,
Frédéric.

[1] https://ceph.io/en/news/blog/2019/new-in-nautilus-pg-merging-and-autotuning/
[2] https://ceph.io/en/news/blog/2022/autoscaler_tuning/

________________________________
De : Niklas Hambüchen <m...@nh2.me>
Envoyé : samedi 21 juin 2025 02:22
À : ceph-users@ceph.io
Objet : [ceph-users] Suspiciously low PG count for CephFS with many small files

I have 2 clusters; both have HDDs and SSDs. Reporting only the HDDs which have 
their own pools:

"rep-cluster": hdd-pool 3-replication,   86 OSDs (16 TiB each), 1024 PGs, 78 
%RAW USED, 100 M objects
"ec-cluster":  hdd-pool erasure k=4 m=2, 58 OSDs (16 TiB each),  256 PGs, 60 
%RAW USED, 450 M objects

Both are Ceph 18.2.1, Bluestore, and have the autoscaler enabled.
As you can see, I have many small objects.

My PGs-copies-per-OSD seem far off from the recommendation of 100 PGs per OSD 
(`mon_target_pg_per_osd`):

rep-cluster: 35 PGs/OSD (= 1024*3/86)
ec-cluster:  26 PGs/OSD (= 256*6/58)

So I'm at least 3x-4x off.
Why?
Should the autoscaler not have increased the PGs here?

`ceph osd pool autoscale-status`:

rep-cluster:
   POOL        SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET RATIO  
EFFECTIVE RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE  BULK
   data      349.4T                3.0         1343T  0.7802                    
              1.0    1024              on         False
ec-cluster:
   data_ec   347.3T                1.5        876.4T  0.5944                    
              1.0     256              on         False


I believe that because of this I suffer some drawbacks:

* On ec-cluster, a PG contains ~2 TiB and ~2 M objects, causing rebalances to 
happen in coarse, slow steps.

Should I take some steps to force the autoscaler to increase PGs, and if yes, 
which approach would be best here?

Thanks for your tips!
Niklas
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Suspiciously low PG count for CephFS with many small files

Reply via email to