[ceph-users] Re: Suspiciously low PG count for CephFS with many small files

Anthony D'Atri Mon, 30 Jun 2025 05:44:45 -0700


> Hi Anthony and others,
> 
> I have now increased the number of PGs on my cluster, but the results are a 
> bit surprising:
> I increased the settings by 4x and obtained a PG increase by 8x.
> 
> Wondering if you have insights why that might be.


Your prior values were extremely low, which no doubt contributes.

> `ceph osd pool autoscale-status` before:
> 
>    POOL                 SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET 
> RATIO  EFFECTIVE RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE  BULK
>    .mgr               435.8M                3.0         6788G  0.0002         
>                          1.0       1              on         False
>    data                   0                 3.0         6788G  0.0000         
>                          1.0    1024              on         False
>    data_ec            345.1T                1.5        876.4T  0.5906         
>                          1.0     256              on         False
>    metadata           123.8G                3.0         6788G  0.0548         
>                          4.0      64              on         False

Is this cluster serving RGW?  RBD? CephFS? Those pool names are unusual.

> I increased the `*_pg_per_osd` settings by 4x by running:
> 
>    ceph config set global mon_target_pg_per_osd 400

I suggested 250.

>    ceph config set global mon_max_pg_per_osd 1000

Remember that mon_max_pg_per_osd is a failsafe, it does not affect the 
autoscaler’s determinations.



> 
> `ceph osd pool autoscale-status` after:
> 
>    # ceph osd pool autoscale-status
>    POOL                 SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET 
> RATIO  EFFECTIVE RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE  BULK
>    .mgr               435.8M                3.0         6788G  0.0002         
>                          1.0       1              on         False
>    data                   0                 3.0         6788G  0.0000         
>                          1.0    1024              on         False
>    data_ec            345.1T                1.5        876.4T  0.5907         
>                          1.0     256        2048  on         False
>    metadata           123.8G                3.0         6788G  0.0548         
>                          4.0      64         512  on         False
> 
> `ceph osd pool autoscale-status` after a few minutes more:
> 
>    # ceph osd pool autoscale-status
>    POOL                 SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET 
> RATIO  EFFECTIVE RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE  BULK
>    .mgr               435.8M                3.0         6788G  0.0002         
>                          1.0       1              on         False
>    data                   0                 3.0         6788G  0.0000         
>                          1.0    1024              on         False
>    data_ec            343.9T                1.5        876.4T  0.5886         
>                          1.0    2048              on         False
>    metadata           127.5G                3.0         6788G  0.0564         
>                          4.0     512              on         False

` ceph osd pool ls detail` will show you a bit more detail - the pg_num vs 
pgp_num values for each pool and given your names, the application association 
for each.


> 
> This is surprising to me:
> 
> * The Ceph autoscaler should increase `PG_NUM` by factors of 2x. It chose to 
> not do that in the "before" state.

It was constrained tightly, and in order to avoid flapping it only takes action 
when the value of pg_num it sets will increase by at least 3x.

> * When I increased `*_pg_per_osd` settings by 4x, I consequently expected 
> that `PG_NUM` would increase by <= 4x. But it increased 8x.

You started off way low, and set the target very high, so this is not entirely 
surprising.

> 
> Side remark:
> While `PG_NUM` is already 2048, the actual number of PGs in `ceph status` is 
> still increasing (currently `1830 pgs`, and some being added every couple 
> minutes). I believe this part is as expected.

yes, pg_num vs pgp_num.  PGs are split or merged incrementally to limit the 
impact on the cluster.

> 
> Thanks!
> Niklas
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Suspiciously low PG count for CephFS with many small files

Reply via email to