Re: [ceph-users] Openstack VM IOPS drops dramatically during Ceph recovery

huxia...@horebdata.cn Wed, 16 Oct 2019 11:53:57 -0700

My Ceph version is Luminuous 12.2.12. Do you think should i upgrade to 
Nautilus, or will Nautilus have a better control of recovery/backfilling?


best regards,

Samuel



huxia...@horebdata.cn
 
From: Robert LeBlanc
Date: 2019-10-14 16:27
To: huxia...@horebdata.cn
CC: ceph-users
Subject: Re: [ceph-users] Openstack VM IOPS drops dramatically during Ceph 
recovery
On Thu, Oct 10, 2019 at 2:23 PM huxia...@horebdata.cn
<huxia...@horebdata.cn> wrote:
>
> Hi, folks,
>
> I have a middle-size Ceph cluster as cinder backup for openstack (queens). 
> Duing testing, one Ceph node went down unexpected and powered up again ca 10 
> minutes later, Ceph cluster starts PG recovery. To my surprise,  VM IOPS 
> drops dramatically during Ceph recovery, from ca. 13K IOPS to about 400, a 
> factor of 1/30, and I did put a stringent throttling on backfill and 
> recovery, with the following ceph parameters
>
>     osd_max_backfills = 1
>     osd_recovery_max_active = 1
>     osd_client_op_priority=63
>     osd_recovery_op_priority=1
>     osd_recovery_sleep = 0.5
>
> The most weird thing is,
> 1) when there is no IO activity from any VM (ALL VMs are quiet except the 
> recovery IO), the recovery bandwidth is ca. 10MiB/s, 2 objects/s. Seems like 
> recovery throttle setting is working properly
> 2) when using FIO testing inside a VM, the recovery bandwith is going up 
> quickly, reaching above 200MiB/s, 60 objects/s. FIO IOPS performance inside 
> VM, however, is only at 400 IOPS/s (8KiB block size), around 3MiB/s. Obvious 
> recovery throttling DOES NOT work properly
> 3) If i stop the FIO testing in VM, the recovery bandwith then goes down to  
> 10MiB/s, 2 objects/s again, strange enough.
>
> How can this weird behavior happen? I just wonder, is there a method to 
> configure recovery bandwith to a specific value, or the number of recovery 
> objects per second? this may give better control of bakcfilling/recovery, 
> instead of the faulty logic or relative osd_client_op_priority vs 
> osd_recovery_op_priority.
>
> any ideas or suggests to make the recovery under control?
>
> best regards,
>
> Samuel
 
Not sure which version of Ceph you are on, but add these to your
/etc/ceph/ceph.conf on all your OSDs and restart them.
 
osd op queue = wpq
osd op queue cut off = high
 
That should really help and make backfills and recovery be
non-impactful. This will be the default in Octopus.
 
--------------------------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Openstack VM IOPS drops dramatically during Ceph recovery

Reply via email to