[ceph-users] Re: PG number per OSD

Anthony D'Atri Sun, 06 Sep 2020 00:32:19 -0700

I think there are multiple variables there.

My advice is for HDDs to aim for an average of 150-200 as I wrote before.  The 
limitation is the speed of the device, throw a thousand PGs on there and you 
won’t get any more out of it, you’ll just have more peering and more RAM used.


NVMe is a different story.

> 
> Is there any rules for computing RAM requeirements in terms of the number of 
> PGs?
> 
> Just curious abount what is the fundamental limitations on the number of PGs 
> per OSD for bigger capacity HDD
> 
> best regards,
> 
> Samuel
> 
> 
> 
> huxia...@horebdata.cn
> 
> From: Anthony D'Atri
> Date: 2020-09-05 20:00
> To: huxia...@horebdata.cn
> CC: ceph-users
> Subject: Re: [ceph-users] PG number per OSD
> One factor is RAM usage, that was IIRC the motivation for the lowering of the 
> recommendation of the ratio from 200 to 100.  Memory needs also increase 
> during recovery and backfill.
> 
> When calculating, be sure to consider repllicas.
> 
> ratio = (pgp_num x replication) / num_osds
> 
> As HDDs grow the interface though isn’t becoming faster (with SATA at least), 
> and there are only so many IOPS and MB/s that you’re going to get out of one 
> no matter how you slice it.  Everything always depends on your use-case and 
> workload, but I suspect that often the bottleneck is the drive, not PG or OSD 
> serialization.
> 
> For example, do you prize IOPS more, latency, or MB/s?  If you don’t care 
> about latency, then you can drive your HDDs harder and get more MB/s 
> throughput out of them, though your average latency might climb to 100ms.  
> Which eg. RBD VM clients probably wouldn’t be too happy about, but which an 
> object service *might* tolerate.
> 
> Basically in the absence of more info, I would personally suggest aiming at 
> the 150-200 average range, with pgp_num a power of 2.  If you aim a bit high, 
> the ratio will come down a bit when you add nodes/OSDs to your cluster to 
> gain capacity.  Be sure to balance usage and watch your mon_max_pg_per_osd 
> setting — allowing some headroom for natural variation and for when 
> components fail.
> 
> YMMV.  
> 
> — aad
> 
>> On Sep 5, 2020, at 10:34 AM, huxia...@horebdata.cn wrote:
>> 
>> Dear Ceph folks,
>> 
>> As the capacity of one HDD (OSD) is growing bigger and bigger, e.g. from 6TB 
>> up to 18TB or even more, should the number of PG per OSD increase as well, 
>> e.g. for 200 to 800. As far as i know, the capacity of each PG should be set 
>> smaller for performance reasons due to the existence of PG locks, thus shall 
>> i set the number of PGs per OSD to 1000 or even 2000?  what is the actual 
>> reason for not setting the number of PGs per OSD? Is there any practical 
>> limations on the number of PGs?
>> 
>> thanks a lot,
>> 
>> Samuel 
>> 
>> 
>> 
>> 
>> huxia...@horebdata.cn
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: PG number per OSD

Reply via email to