I guess I should probably have been more clear, this is one pool of many, so the other OSDs aren't idle.
So I don't necessarily think that the PG bump would be the worst thing to try, but its definitely not as bad as I may have made it sound. Thanks, Reed > On May 27, 2021, at 11:37 PM, Anthony D'Atri <anthony.da...@gmail.com> wrote: > > That gives you a PG ratio of …. 5.3 ??? > > Run `ceph osd df` ; I wouldn’t be surprised if some of your drives have 0 PGs > on them, for sure I would suspect that they aren’t even at all. > > There are bottlenecks in the PG code, and in the OSD code — one reason why > with NVMe clusters it’s common to split each drive into at least 2 OSDs. > With spinners you don’t want to do that, but you get the idea. > > The pg autoscaler is usually out of its Vulcan mind. 512 would give you a > ratio of just 21. > > Prior to 12.2.1 conventional wisdom was a PG ratio of 100-200 on spinners. > > 2048 PGs would give you a ratio of 85, which current (retconned) guidance > would call good. I’d probably go to 4096 but 2048 would be way better than > 128. > > I strongly suspect that PG splitting would still get you done faster than the > way it is, esp. if you’re running BlueStore OSDs. > > Try bumping pg_num up to say 262 and see how bad it is, and if when pgp_num > catches up if your ingest rate isn’t a bit higher than it was before. > >> EC8:2, across about 16 hosts, 240 OSDs, with 24 of those being 8TB 7.2k SAS, >> and the other 216 being 2TB 7.2K SATA. So there are quite a few spindles in >> play here. >> Only 128 PGs, in this pool, but its the only RBD image in this pool. >> Autoscaler recommends going to 512, but was hoping to avoid the performance >> overhead of the PG splits if possible, given perf is bad enough as is. > > _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io