I guess I should probably have been more clear, this is one pool of many, so 
the other OSDs aren't idle.

So I don't necessarily think that the PG bump would be the worst thing to try, 
but its definitely not as bad as I may have made it sound.

Thanks,
Reed

> On May 27, 2021, at 11:37 PM, Anthony D'Atri <anthony.da...@gmail.com> wrote:
> 
> That gives you a PG ratio of …. 5.3 ???
> 
> Run `ceph osd df` ; I wouldn’t be surprised if some of your drives have 0 PGs 
> on them, for sure I would suspect that they aren’t even at all.
> 
> There are bottlenecks in the PG code, and in the OSD code — one reason why 
> with NVMe clusters it’s common to split each drive into at least 2 OSDs.  
> With spinners you don’t want to do that, but you get the idea.
> 
> The pg autoscaler is usually out of its Vulcan mind.  512 would give you a 
> ratio of just 21.
> 
> Prior to 12.2.1 conventional wisdom was a PG ratio of 100-200 on spinners.
> 
> 2048 PGs would give you a ratio of 85, which current (retconned) guidance 
> would call good.  I’d probably go to 4096 but 2048 would be way better than 
> 128.
> 
> I strongly suspect that PG splitting would still get you done faster than the 
> way it is, esp. if you’re running BlueStore OSDs.
> 
> Try bumping pg_num up to say 262 and see how bad it is, and if when pgp_num 
> catches up if your ingest rate isn’t a bit higher than it was before.  
> 
>> EC8:2, across about 16 hosts, 240 OSDs, with 24 of those being 8TB 7.2k SAS, 
>> and the other 216 being 2TB 7.2K SATA. So there are quite a few spindles in 
>> play here.
>> Only 128 PGs, in this pool, but its the only RBD image in this pool. 
>> Autoscaler recommends going to 512, but was hoping to avoid the performance 
>> overhead of the PG splits if possible, given perf is bad enough as is.
> 
> 
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to