Re: [ceph-users] All SSD cluster performance

Wido den Hollander Fri, 13 Jan 2017 09:41:37 -0800

> Op 13 januari 2017 om 18:39 schreef Mohammed Naser <mna...@vexxhost.com>:
> 
> 
> 
> > On Jan 13, 2017, at 12:37 PM, Wido den Hollander <w...@42on.com> wrote:
> > 
> > 
> >> Op 13 januari 2017 om 18:18 schreef Mohammed Naser <mna...@vexxhost.com>:
> >> 
> >> 
> >> Hi everyone,
> >> 
> >> We have a deployment with 90 OSDs at the moment which is all SSD that’s 
> >> not hitting quite the performance that it should be in my opinion, a 
> >> `rados bench` run gives something along these numbers:
> >> 
> >> Maintaining 16 concurrent writes of 4194304 bytes to objects of size 
> >> 4194304 for up to 10 seconds or 0 objects
> >> Object prefix: benchmark_data_bench.vexxhost._30340
> >>  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg 
> >> lat(s)
> >>    0       0         0         0         0         0           -           > >> 0
> >>    1      16       158       142   568.513       568   0.0965336   
> >> 0.0939971
> >>    2      16       287       271   542.191       516   0.0291494    
> >> 0.107503
> >>    3      16       375       359    478.75       352   0.0892724    
> >> 0.118463
> >>    4      16       477       461   461.042       408   0.0243493    
> >> 0.126649
> >>    5      16       540       524   419.216       252    0.239123    
> >> 0.132195
> >>    6      16       644       628    418.67       416    0.347606    
> >> 0.146832
> >>    7      16       734       718   410.281       360   0.0534447    
> >> 0.147413
> >>    8      16       811       795   397.487       308   0.0311927     
> >> 0.15004
> >>    9      16       879       863   383.537       272   0.0894534    
> >> 0.158513
> >>   10      16       980       964   385.578       404   0.0969865    
> >> 0.162121
> >>   11       3       981       978   355.613        56    0.798949    
> >> 0.171779
> >> Total time run:         11.063482
> >> Total writes made:      981
> >> Write size:             4194304
> >> Object size:            4194304
> >> Bandwidth (MB/sec):     354.68
> >> Stddev Bandwidth:       137.608
> >> Max bandwidth (MB/sec): 568
> >> Min bandwidth (MB/sec): 56
> >> Average IOPS:           88
> >> Stddev IOPS:            34
> >> Max IOPS:               142
> >> Min IOPS:               14
> >> Average Latency(s):     0.175273
> >> Stddev Latency(s):      0.294736
> >> Max latency(s):         1.97781
> >> Min latency(s):         0.0205769
> >> Cleaning up (deleting benchmark objects)
> >> Clean up completed and total clean up time :3.895293
> >> 
> >> We’ve verified the network by running `iperf` across both replication and 
> >> public networks and it resulted in 9.8Gb/s (10G links for both).  The 
> >> machine that’s running the benchmark doesn’t even saturate it’s port.  The 
> >> SSDs are S3520 960GB drives which we’ve benchmarked and they can handle 
> >> the load using fio/etc.  At this point, not really sure where to look 
> >> next.. anyone running all SSD clusters that might be able to share their 
> >> experience?
> > 
> > I suggest that you search a bit on the ceph-users list since this topic has 
> > been discussed multiple times in the past and even recently.
> > 
> > Ceph isn't your average storage system and you have to keep that in mind. 
> > Nothing is free in this world. Ceph provides excellent consistency and 
> > distribution of data, but that also means that you have things like network 
> > and CPU latency.
> > 
> > However, I suggest you look up a few threads on this list which have 
> > valuable tips.
> > 
> > Wido
> 
> Thanks for the reply, I’ve actually done quite a lot of research and went 
> through many of the previous posts.  While I agree a 100% with your 
> statement, I’ve found that other people with similar setups have been able to 
> reach numbers that I cannot, which leads me to believe that there is actually 
> an issue in here.  They have been able to max out at 1200 MB/s which is the 
> maximum of their benchmarking host.  We’d like to reach that and I think that 
> given the specifications of the cluster, it can do it with no problems.


A few tips:

- Disable all logging in Ceph (debug_osd, debug_ms, debug_auth, etc, etc)
- Disable power saving on the CPUs

Can you also share how the 90 OSDs are distributed in the cluster and what CPUs 
you have?

Wido

> 
> >> 
> >> Thanks,
> >> Mohammed
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] All SSD cluster performance

Reply via email to