Re: [ceph-users] All SSD cluster performance

Mohammed Naser Fri, 13 Jan 2017 11:35:00 -0800

> On Jan 13, 2017, at 1:34 PM, Wido den Hollander <w...@42on.com> wrote:
> 
>> 
>> Op 13 januari 2017 om 18:50 schreef Mohammed Naser <mna...@vexxhost.com>:
>> 
>> 
>> 
>>> On Jan 13, 2017, at 12:41 PM, Wido den Hollander <w...@42on.com> wrote:
>>> 
>>> 
>>>> Op 13 januari 2017 om 18:39 schreef Mohammed Naser <mna...@vexxhost.com>:
>>>> 
>>>> 
>>>> 
>>>>> On Jan 13, 2017, at 12:37 PM, Wido den Hollander <w...@42on.com> wrote:
>>>>> 
>>>>> 
>>>>>> Op 13 januari 2017 om 18:18 schreef Mohammed Naser <mna...@vexxhost.com>:
>>>>>> 
>>>>>> 
>>>>>> Hi everyone,
>>>>>> 
>>>>>> We have a deployment with 90 OSDs at the moment which is all SSD that’s 
>>>>>> not hitting quite the performance that it should be in my opinion, a 
>>>>>> `rados bench` run gives something along these numbers:
>>>>>> 
>>>>>> Maintaining 16 concurrent writes of 4194304 bytes to objects of size 
>>>>>> 4194304 for up to 10 seconds or 0 objects
>>>>>> Object prefix: benchmark_data_bench.vexxhost._30340
>>>>>> sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg 
>>>>>> lat(s)
>>>>>>  0       0         0         0         0         0           -           >>>>>> 0
>>>>>>  1      16       158       142   568.513       568   0.0965336   
>>>>>> 0.0939971
>>>>>>  2      16       287       271   542.191       516   0.0291494    
>>>>>> 0.107503
>>>>>>  3      16       375       359    478.75       352   0.0892724    
>>>>>> 0.118463
>>>>>>  4      16       477       461   461.042       408   0.0243493    
>>>>>> 0.126649
>>>>>>  5      16       540       524   419.216       252    0.239123    
>>>>>> 0.132195
>>>>>>  6      16       644       628    418.67       416    0.347606    
>>>>>> 0.146832
>>>>>>  7      16       734       718   410.281       360   0.0534447    
>>>>>> 0.147413
>>>>>>  8      16       811       795   397.487       308   0.0311927     
>>>>>> 0.15004
>>>>>>  9      16       879       863   383.537       272   0.0894534    
>>>>>> 0.158513
>>>>>> 10      16       980       964   385.578       404   0.0969865    
>>>>>> 0.162121
>>>>>> 11       3       981       978   355.613        56    0.798949    
>>>>>> 0.171779
>>>>>> Total time run:         11.063482
>>>>>> Total writes made:      981
>>>>>> Write size:             4194304
>>>>>> Object size:            4194304
>>>>>> Bandwidth (MB/sec):     354.68
>>>>>> Stddev Bandwidth:       137.608
>>>>>> Max bandwidth (MB/sec): 568
>>>>>> Min bandwidth (MB/sec): 56
>>>>>> Average IOPS:           88
>>>>>> Stddev IOPS:            34
>>>>>> Max IOPS:               142
>>>>>> Min IOPS:               14
>>>>>> Average Latency(s):     0.175273
>>>>>> Stddev Latency(s):      0.294736
>>>>>> Max latency(s):         1.97781
>>>>>> Min latency(s):         0.0205769
>>>>>> Cleaning up (deleting benchmark objects)
>>>>>> Clean up completed and total clean up time :3.895293
>>>>>> 
>>>>>> We’ve verified the network by running `iperf` across both replication 
>>>>>> and public networks and it resulted in 9.8Gb/s (10G links for both).  
>>>>>> The machine that’s running the benchmark doesn’t even saturate it’s 
>>>>>> port.  The SSDs are S3520 960GB drives which we’ve benchmarked and they 
>>>>>> can handle the load using fio/etc.  At this point, not really sure where 
>>>>>> to look next.. anyone running all SSD clusters that might be able to 
>>>>>> share their experience?
>>>>> 
>>>>> I suggest that you search a bit on the ceph-users list since this topic 
>>>>> has been discussed multiple times in the past and even recently.
>>>>> 
>>>>> Ceph isn't your average storage system and you have to keep that in mind. 
>>>>> Nothing is free in this world. Ceph provides excellent consistency and 
>>>>> distribution of data, but that also means that you have things like 
>>>>> network and CPU latency.
>>>>> 
>>>>> However, I suggest you look up a few threads on this list which have 
>>>>> valuable tips.
>>>>> 
>>>>> Wido
>>>> 
>>>> Thanks for the reply, I’ve actually done quite a lot of research and went 
>>>> through many of the previous posts. While I agree a 100% with your 
>>>> statement, I’ve found that other people with similar setups have been able 
>>>> to reach numbers that I cannot, which leads me to believe that there is 
>>>> actually an issue in here.  They have been able to max out at 1200 MB/s 
>>>> which is the maximum of their benchmarking host.  We’d like to reach that 
>>>> and I think that given the specifications of the cluster, it can do it 
>>>> with no problems.
>>> 
>>> A few tips:
>>> 
>>> - Disable all logging in Ceph (debug_osd, debug_ms, debug_auth, etc, etc)
>> 
>> All logging is configured to default settings, should those be turned down?
> 
> Yes, disable all logging improves performance.


I’ll look into disabling it.

> 
>> 
>>> - Disable power saving on the CPUs
>> 
>> All disabled as well, everything running on `performance` mode.
>> 
>>> 
>>> Can you also share how the 90 OSDs are distributed in the cluster and what 
>>> CPUs you have?
>> 
>> There are 45 machines with 2 OSDs each.   The servers they’re located on on 
>> average have 24 core ~3GHz Intel CPUs.  Both OSDs are pinned to two cores on 
>> the system.
>> 
> 
> So 45 machines in total with 2 OSDs/SSDs each.
> 
> What is the network? 10GbE? What is the latency for a 8k packet? (ping -s 
> 8192)

It is a 10GbE network, the latency is on average 0.217ms

> 
> Also try running rados bench with more threads, 16 isn't that much. Try 
> running with 128 or so from multiple clients.

With 128 threads, I’m able to get an average of 900.  Every drive seems to 
average out to ~20MBps on that peak.  Looking at running it multiple times 
seems to introduce very odd issues with extra data… is multiple rados bench 
runs not supported?

> 
> Wido
> 
>>> 
>>> Wido
>>> 
>>>> 
>>>>>> 
>>>>>> Thanks,
>>>>>> Mohammed
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list
>>>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>>>>> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] All SSD cluster performance

Reply via email to