My Ceph version is Luminuous 12.2.12. Do you think should i upgrade to Nautilus, or will Nautilus have a better control of recovery/backfilling?
best regards, Samuel huxia...@horebdata.cn From: Robert LeBlanc Date: 2019-10-14 16:27 To: huxia...@horebdata.cn CC: ceph-users Subject: Re: [ceph-users] Openstack VM IOPS drops dramatically during Ceph recovery On Thu, Oct 10, 2019 at 2:23 PM huxia...@horebdata.cn <huxia...@horebdata.cn> wrote: > > Hi, folks, > > I have a middle-size Ceph cluster as cinder backup for openstack (queens). > Duing testing, one Ceph node went down unexpected and powered up again ca 10 > minutes later, Ceph cluster starts PG recovery. To my surprise, VM IOPS > drops dramatically during Ceph recovery, from ca. 13K IOPS to about 400, a > factor of 1/30, and I did put a stringent throttling on backfill and > recovery, with the following ceph parameters > > osd_max_backfills = 1 > osd_recovery_max_active = 1 > osd_client_op_priority=63 > osd_recovery_op_priority=1 > osd_recovery_sleep = 0.5 > > The most weird thing is, > 1) when there is no IO activity from any VM (ALL VMs are quiet except the > recovery IO), the recovery bandwidth is ca. 10MiB/s, 2 objects/s. Seems like > recovery throttle setting is working properly > 2) when using FIO testing inside a VM, the recovery bandwith is going up > quickly, reaching above 200MiB/s, 60 objects/s. FIO IOPS performance inside > VM, however, is only at 400 IOPS/s (8KiB block size), around 3MiB/s. Obvious > recovery throttling DOES NOT work properly > 3) If i stop the FIO testing in VM, the recovery bandwith then goes down to > 10MiB/s, 2 objects/s again, strange enough. > > How can this weird behavior happen? I just wonder, is there a method to > configure recovery bandwith to a specific value, or the number of recovery > objects per second? this may give better control of bakcfilling/recovery, > instead of the faulty logic or relative osd_client_op_priority vs > osd_recovery_op_priority. > > any ideas or suggests to make the recovery under control? > > best regards, > > Samuel Not sure which version of Ceph you are on, but add these to your /etc/ceph/ceph.conf on all your OSDs and restart them. osd op queue = wpq osd op queue cut off = high That should really help and make backfills and recovery be non-impactful. This will be the default in Octopus. -------------------------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com