[ceph-users] Re: Suspiciously low PG count for CephFS with many small files

Niklas Hambüchen Mon, 30 Jun 2025 06:53:36 -0700

Is this cluster serving RGW?  RBD? CephFS? Those pool names are unusual.


Just CephFS.
I named the pools this way, following 
https://docs.ceph.com/en/reef/cephfs/createfs/#creating-a-file-system

I suggested 250.


Yes, but it is actually great to have even less objects per PG, because then 
I'd arrive at 250k objects/PG (instead of my 2M from before), which should make 
the recovery time of an individual PG more reasonable.

So I think it's great that I get 8x more PGs.

But I'd like to understand _why_ it's happening as it did, because I expected 
that a 4x increase of `*_pg_per_osd` should only be able to achieve a 4x PG 
increase.

I'm wondering if I had hit an autoscaler bug before (that my PGs for data_ec 
should really have been at 512 instead of 256), which would be good to report 
if so.

I think your explanation with `NEW PG_NUM` having to be 3x larger for the 
autoscaler to take action
(https://docs.ceph.com/en/latest/rados/operations/placement-groups/#viewing-pg-scaling-recommendations)
makes sense:

It was constrained tightly, and in order to avoid flapping it only takes action 
when the value of pg_num it sets will increase by at least 3x.


E.g. if before I was at 256, and `NEW PG_NUM` was at at 500 (< 768 = 3*256), 
then it would not take action; if my increase in settings by 4x would result in 
`NEW PG_NUM` being 500 * 4 = 2000, it makes sense that it then sets it to 2048.

So I think that explains it sufficiently, thanks!

Remember that mon_max_pg_per_osd is a failsafe, it does not affect the 
autoscaler’s determinations.


Yes, that makes sense.

` ceph osd pool ls detail` will show you a bit more detail - the pg_num vs 
pgp_num values for each pool and given your names, the application association 
for each.


For reference, here's my output:

    pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 6 object_hash 
rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 21504 flags 
hashpspool stripe_width 0 pg_num_min 1 application mgr,mgr_devicehealth 
read_balance_score 15.79
    pool 2 'data' replicated size 3 min_size 2 crush_rule 7 object_hash 
rjenkins pg_num 1024 pgp_num 1024 autoscale_mode on last_change 21506 lfor 
0/1905/12064 flags hashpspool stripe_width 0 pg_num_min 1024 application cephfs 
read_balance_score 2.67
    pool 3 'data_ec' erasure profile ec_profile size 6 min_size 5 crush_rule 8 
object_hash rjenkins pg_num 505 pgp_num 377 pg_num_target 2048 pgp_num_target 
2048 autoscale_mode on last_change 24867 lfor 0/2200/24867 flags 
hashpspool,ec_overwrites stripe_width 16384 application cephfs
    pool 4 'metadata' replicated size 3 min_size 2 crush_rule 9 object_hash 
rjenkins pg_num 306 pgp_num 178 pg_num_target 512 pgp_num_target 512 
autoscale_mode on last_change 24869 lfor 0/17183/24869 flags hashpspool 
stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 
application cephfs read_balance_score 2.93

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Suspiciously low PG count for CephFS with many small files

Reply via email to