[ceph-users] Re: Doubled numbers of PGs from 8192 to 16384 - backfill bottlenecked

Anthony D'Atri Wed, 30 Apr 2025 06:46:34 -0700

> 
>> I suggest playing with 
>> https://docs.ceph.com/en/squid/rados/operations/pgcalc/
>> … setting the target PGs per OSD to 250
> 
> There was a thread[1] last year about many PGs pr OSD without any firm 
> conclusions, so we are going to bump our number of PGs for the largest HDDs a 
> lot higher than 250 while keeping an eye on the impact. Currently sitting at 
> something like 550 PGs for a 20TB drive.


As OSDs become increasingly larger I think we need to take a look at the costs 
of more PGs (more peering, more memory) vs the costs of having extremely large 
PGs (uniform distribution, any backfill/remap is a huge operation).

In the past there was the idea that SSDs can “handle” more PGs than HDDs from a 
parallelism perspective, but over time I’ve come to suspect that may not be 
strictly the case, as the cluster’s ops are still divided among HDDs regardless 
of pg_num, and the driver/firmware still does elevator scheduling or w/e.  
Maybe that was an artifact of Filestore that is no longer relevant?

Re backfill parallelism I’ve experienced similar situations, titrating 
max_backfills seemed to help.

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Doubled numbers of PGs from 8192 to 16384 - backfill bottlenecked

Reply via email to