I think there are multiple variables there. My advice is for HDDs to aim for an average of 150-200 as I wrote before. The limitation is the speed of the device, throw a thousand PGs on there and you won’t get any more out of it, you’ll just have more peering and more RAM used.
NVMe is a different story. > > Is there any rules for computing RAM requeirements in terms of the number of > PGs? > > Just curious abount what is the fundamental limitations on the number of PGs > per OSD for bigger capacity HDD > > best regards, > > Samuel > > > > huxia...@horebdata.cn > > From: Anthony D'Atri > Date: 2020-09-05 20:00 > To: huxia...@horebdata.cn > CC: ceph-users > Subject: Re: [ceph-users] PG number per OSD > One factor is RAM usage, that was IIRC the motivation for the lowering of the > recommendation of the ratio from 200 to 100. Memory needs also increase > during recovery and backfill. > > When calculating, be sure to consider repllicas. > > ratio = (pgp_num x replication) / num_osds > > As HDDs grow the interface though isn’t becoming faster (with SATA at least), > and there are only so many IOPS and MB/s that you’re going to get out of one > no matter how you slice it. Everything always depends on your use-case and > workload, but I suspect that often the bottleneck is the drive, not PG or OSD > serialization. > > For example, do you prize IOPS more, latency, or MB/s? If you don’t care > about latency, then you can drive your HDDs harder and get more MB/s > throughput out of them, though your average latency might climb to 100ms. > Which eg. RBD VM clients probably wouldn’t be too happy about, but which an > object service *might* tolerate. > > Basically in the absence of more info, I would personally suggest aiming at > the 150-200 average range, with pgp_num a power of 2. If you aim a bit high, > the ratio will come down a bit when you add nodes/OSDs to your cluster to > gain capacity. Be sure to balance usage and watch your mon_max_pg_per_osd > setting — allowing some headroom for natural variation and for when > components fail. > > YMMV. > > — aad > >> On Sep 5, 2020, at 10:34 AM, huxia...@horebdata.cn wrote: >> >> Dear Ceph folks, >> >> As the capacity of one HDD (OSD) is growing bigger and bigger, e.g. from 6TB >> up to 18TB or even more, should the number of PG per OSD increase as well, >> e.g. for 200 to 800. As far as i know, the capacity of each PG should be set >> smaller for performance reasons due to the existence of PG locks, thus shall >> i set the number of PGs per OSD to 1000 or even 2000? what is the actual >> reason for not setting the number of PGs per OSD? Is there any practical >> limations on the number of PGs? >> >> thanks a lot, >> >> Samuel >> >> >> >> >> huxia...@horebdata.cn >> _______________________________________________ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io > > > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io