This can be subtle and is easy to mix up.

The “PG ratio” is intended to be the number of PGs hosted on each OSD, plus or 
minus a few.

Note how I phrased that, it’s not the number of PGs divided by the number of 
OSDs.  Remember that PGs are replicated.

While each PG belongs to exactly one pool, for purposes of estimating pg_num, 
we calculate the desired aggregate number of PGs on this ratio, then divide 
that up among pools, ideally split into powers of 2 per pool, relative to the 
amount of data in the pool.

You can run `ceph osd df` and see the number of PGs on each OSD.  There will be 
some variance, but consider the average.

This venerable calculator:

https://old.ceph.com/pgcalc/
PGCalc
old.ceph.com

can help get a feel for how this works.

100 is the official party line, it used to be 200.  More PGs means more memory 
use; too few has various other drawbacks.

PGs can in part be thought of as parallelism domains; more PGs means more 
parallelism.  So on HDDs, a ratio in the 100-200 range is IMHO reasonable.  
SAS/SATA OSDs 200-300, NVMe OSDs perhaps higher, though perhaps not if each 
device hosts more than one OSD (which should only ever be done on NVMe devices).

Your numbers below are probably ok for HDDs, you might bump the pool with the 
most data up to the next power of 2 if these are SSDs.

The pgcalc above includes parameters for what fraction of the cluster’s data 
each pool contains.  A pool with 5% of the data needs fewer PGs than a pool 
with 50% of the cluster’s data.

Others may well have different perspectives, this is something where opinions 
vary.  The pg_autoscaler in bulk mode can automate this, if one is prescient 
with feeding it parameters.



> On Feb 28, 2023, at 9:23 PM, Deep Dish <deeepd...@gmail.com> wrote:
> 
> Hello
> 
> 
> 
> Looking to get some official guidance on PG and PGP sizing.
> 
> 
> 
> Is the goal to maintain approximately 100 PGs per OSD per pool or for the
> cluster general?
> 
> 
> 
> Assume the following scenario:
> 
> 
> 
> Cluster with 80 OSD across 8 nodes;
> 
> 3 Pools:
> 
> -       Pool1 = Replicated 3x
> 
> -       Pool2 = Replicated 3x
> 
> -       Pool3 = Erasure Coded 6-4
> 
> 
> 
> 
> 
> Assuming the well published formula:
> 
> 
> 
> Let (Target PGs / OSD) = 100
> 
> 
> 
> [ (Target PGs / OSD) * (# of OSDs) ] / (Replica Size)
> 
> 
> 
> -       Pool1 = (100*80)/3 = 2666.67 => 4096
> 
> -       Pool2 = (100*80)/3 = 2666.67 => 4096
> 
> -       Pool3 = (100*80)/10 = 800 => 1024
> 
> 
> 
> Total cluster would have 9216 PGs and PGPs.
> 
> 
> Are there any implications (performance / monitor / MDS / RGW sizing) with
> how many PGs are created on the cluster?
> 
> 
> 
> Looking for validation and / or clarification of the above.
> 
> 
> 
> Thank you.
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to