Not sure what your OSD config looks like,

When I was moving from Filestore to Bluestore on my SSD OSD's (and NVMe FS 
journal to NVMe Bluestore block.db),
I had an issue where the OSD was incorrectly being reported as rotational in 
some part of the chain.
Once I overcame that, I had a huge boost in recovery performance (repaving 
OSDs).
Might be something useful in there.

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-February/025039.html 
<http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-February/025039.html>

Reed

> On Mar 19, 2019, at 11:29 PM, Konstantin Shalygin <k0...@k0ste.ru> wrote:
> 
> 
>> I setup an SSD Luminous 12.2.11 cluster and realized after data had been
>> added that pg_num was not set properly on the default.rgw.buckets.data pool
>> ( where all the data goes ).  I adjusted the settings up, but recovery is
>> going really slow ( like 56-110MiB/s ) ticking down at .002 per log
>> entry(ceph -w).  These are all SSDs on luminous 12.2.11 ( no journal drives
>> ) with a set of 2 10Gb fiber twinax in a bonded LACP config.  There are six
>> servers, 60 OSDs, each OSD is 2TB.  There was about 4TB of data ( 3 million
>> objects ) added to the cluster before I noticed the red blinking lights.
>> 
>>  
>> 
>> I tried adjusting the recovery to:
>> 
>> ceph tell 'osd.*' injectargs '--osd-max-backfills 16'
>> 
>> ceph tell 'osd.*' injectargs '--osd-recovery-max-active 30'
>> 
>>  
>> 
>> Which did help a little, but didn't seem to have the impact I was looking
>> for.  I have used the settings on HDD clusters before to speed things up (
>> using 8 backfills and 4 max active though ).  Did I miss something or is
>> this part of the pg expansion process.  Should I be doing something else
>> with SSD clusters?
>> 
>>  
>> 
>> Regards,
>> 
>> -Brent
>> 
>>  
>> 
>> Existing Clusters:
>> 
>> Test: Luminous 12.2.11 with 3 osd servers, 1 mon/man, 1 gateway ( all
>> virtual on SSD )
>> 
>> US Production(HDD): Jewel 10.2.11 with 5 osd servers, 3 mons, 3 gateways
>> behind haproxy LB
>> 
>> UK Production(HDD): Luminous 12.2.11 with 15 osd servers, 3 mons/man, 3
>> gateways behind haproxy LB
>> 
>> US Production(SSD): Luminous 12.2.11 with 6 osd servers, 3 mons/man, 3
>> gateways behind haproxy LB
> 
> Try to lower `osd_recovery_sleep*` options.
> 
> You can get your current values from ceph admin socket like this:
> 
> ```
> 
> ceph daemon osd.0 config show | jq 'to_entries[] | if 
> (.key|test("^(osd_recovery_sleep)(.*)")) then (.) else empty end'
> 
> ```
> 
> 
> 
> k
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to