> >> I suggest playing with >> https://docs.ceph.com/en/squid/rados/operations/pgcalc/ >> … setting the target PGs per OSD to 250 > > There was a thread[1] last year about many PGs pr OSD without any firm > conclusions, so we are going to bump our number of PGs for the largest HDDs a > lot higher than 250 while keeping an eye on the impact. Currently sitting at > something like 550 PGs for a 20TB drive.
As OSDs become increasingly larger I think we need to take a look at the costs of more PGs (more peering, more memory) vs the costs of having extremely large PGs (uniform distribution, any backfill/remap is a huge operation). In the past there was the idea that SSDs can “handle” more PGs than HDDs from a parallelism perspective, but over time I’ve come to suspect that may not be strictly the case, as the cluster’s ops are still divided among HDDs regardless of pg_num, and the driver/firmware still does elevator scheduling or w/e. Maybe that was an artifact of Filestore that is no longer relevant? Re backfill parallelism I’ve experienced similar situations, titrating max_backfills seemed to help. _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io