We have a 14 osd node all ssd cluster and for some reason we are continually 
getting laggy PGs and those seem to correlate to slow requests on Quincy 
(doesn't seem to happen on our Pacific clusters). These laggy pgs seem to shift 
between osds. The network seems solid, as in I'm not seeing errors or slowness. 
OSD hosts are heavily underutilized, normally sub 1 load and the cpus are 98% 
idle. I have been looking through the logs and nothing is really standing out 
in the OSD or ceph logs.

Some things we have tried:

  1.  Updating our cluster to 17.2.5
  2.  Manually setting our mClock profile to high_client_ops.
  3.  Increasing our total number of PGs (this something that should've 
happened anyways.)
  4.  Verified that jumbo frames, lacp, and throughput were functioning as 
intended.
  5.  Took some of our newer nodes out to see if that was an issue. Also 
rebooted the cluster just to be sure.

I'm curious if someone in the community has experience with this kind of issue 
and maybe could point to something I have overlooked.

Some example logs:

2023-01-10T22:50:23.245823+0000 mgr.openstack-mon01.b.pc.ostk.com.flbudm 
(mgr.120371640) 231175 : cluster [DBG] pgmap v235204: 2625 pgs: 1 
active+clean+laggy, 2624 active+clean; 6.0 TiB data, 18 TiB used, 84 TiB
 / 102 TiB avail; 19 MiB/s rd, 67 MiB/s wr, 4.76k op/s
2023-01-10T22:50:23.762562+0000 osd.83 (osd.83) 906 : cluster [WRN] 6 slow 
requests (by type [ 'delayed' : 5 'waiting for sub ops' : 1 ] most affected 
pool [ 'vms' : 6 ])
2023-01-10T22:50:24.771260+0000 osd.83 (osd.83) 907 : cluster [WRN] 6 slow 
requests (by type [ 'delayed' : 5 'waiting for sub ops' : 1 ] most affected 
pool [ 'vms' : 6 ])


________________________________

CONFIDENTIALITY NOTICE: This message is intended only for the use and review of 
the individual or entity to which it is addressed and may contain information 
that is privileged and confidential. If the reader of this message is not the 
intended recipient, or the employee or agent responsible for delivering the 
message solely to the intended recipient, you are hereby notified that any 
dissemination, distribution or copying of this communication is strictly 
prohibited. If you have received this communication in error, please notify 
sender immediately by telephone or return email. Thank you.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to