Re: [ceph-users] All SSD cluster performance

Wido den Hollander Fri, 13 Jan 2017 10:35:41 -0800

> Op 13 januari 2017 om 18:50 schreef Mohammed Naser <mna...@vexxhost.com>:
> 
> 
> 
> > On Jan 13, 2017, at 12:41 PM, Wido den Hollander <w...@42on.com> wrote:
> > 
> > 
> >> Op 13 januari 2017 om 18:39 schreef Mohammed Naser <mna...@vexxhost.com>:
> >> 
> >> 
> >> 
> >>> On Jan 13, 2017, at 12:37 PM, Wido den Hollander <w...@42on.com> wrote:
> >>> 
> >>> 
> >>>> Op 13 januari 2017 om 18:18 schreef Mohammed Naser <mna...@vexxhost.com>:
> >>>> 
> >>>> 
> >>>> Hi everyone,
> >>>> 
> >>>> We have a deployment with 90 OSDs at the moment which is all SSD that’s 
> >>>> not hitting quite the performance that it should be in my opinion, a 
> >>>> `rados bench` run gives something along these numbers:
> >>>> 
> >>>> Maintaining 16 concurrent writes of 4194304 bytes to objects of size 
> >>>> 4194304 for up to 10 seconds or 0 objects
> >>>> Object prefix: benchmark_data_bench.vexxhost._30340
> >>>> sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg 
> >>>> lat(s)
> >>>>   0       0         0         0         0         0           -          
> >>>>  0
> >>>>   1      16       158       142   568.513       568   0.0965336   
> >>>> 0.0939971
> >>>>   2      16       287       271   542.191       516   0.0291494    
> >>>> 0.107503
> >>>>   3      16       375       359    478.75       352   0.0892724    
> >>>> 0.118463
> >>>>   4      16       477       461   461.042       408   0.0243493    
> >>>> 0.126649
> >>>>   5      16       540       524   419.216       252    0.239123    
> >>>> 0.132195
> >>>>   6      16       644       628    418.67       416    0.347606    
> >>>> 0.146832
> >>>>   7      16       734       718   410.281       360   0.0534447    
> >>>> 0.147413
> >>>>   8      16       811       795   397.487       308   0.0311927     
> >>>> 0.15004
> >>>>   9      16       879       863   383.537       272   0.0894534    
> >>>> 0.158513
> >>>>  10      16       980       964   385.578       404   0.0969865    
> >>>> 0.162121
> >>>>  11       3       981       978   355.613        56    0.798949    
> >>>> 0.171779
> >>>> Total time run:         11.063482
> >>>> Total writes made:      981
> >>>> Write size:             4194304
> >>>> Object size:            4194304
> >>>> Bandwidth (MB/sec):     354.68
> >>>> Stddev Bandwidth:       137.608
> >>>> Max bandwidth (MB/sec): 568
> >>>> Min bandwidth (MB/sec): 56
> >>>> Average IOPS:           88
> >>>> Stddev IOPS:            34
> >>>> Max IOPS:               142
> >>>> Min IOPS:               14
> >>>> Average Latency(s):     0.175273
> >>>> Stddev Latency(s):      0.294736
> >>>> Max latency(s):         1.97781
> >>>> Min latency(s):         0.0205769
> >>>> Cleaning up (deleting benchmark objects)
> >>>> Clean up completed and total clean up time :3.895293
> >>>> 
> >>>> We’ve verified the network by running `iperf` across both replication 
> >>>> and public networks and it resulted in 9.8Gb/s (10G links for both).  
> >>>> The machine that’s running the benchmark doesn’t even saturate it’s 
> >>>> port.  The SSDs are S3520 960GB drives which we’ve benchmarked and they 
> >>>> can handle the load using fio/etc.  At this point, not really sure where 
> >>>> to look next.. anyone running all SSD clusters that might be able to 
> >>>> share their experience?
> >>> 
> >>> I suggest that you search a bit on the ceph-users list since this topic 
> >>> has been discussed multiple times in the past and even recently.
> >>> 
> >>> Ceph isn't your average storage system and you have to keep that in mind. 
> >>> Nothing is free in this world. Ceph provides excellent consistency and 
> >>> distribution of data, but that also means that you have things like 
> >>> network and CPU latency.
> >>> 
> >>> However, I suggest you look up a few threads on this list which have 
> >>> valuable tips.
> >>> 
> >>> Wido
> >> 
> >> Thanks for the reply, I’ve actually done quite a lot of research and went 
> >> through many of the previous posts.  While I agree a 100% with your 
> >> statement, I’ve found that other people with similar setups have been able 
> >> to reach numbers that I cannot, which leads me to believe that there is 
> >> actually an issue in here.  They have been able to max out at 1200 MB/s 
> >> which is the maximum of their benchmarking host.  We’d like to reach that 
> >> and I think that given the specifications of the cluster, it can do it 
> >> with no problems.
> > 
> > A few tips:
> > 
> > - Disable all logging in Ceph (debug_osd, debug_ms, debug_auth, etc, etc)
> 
> All logging is configured to default settings, should those be turned down?


Yes, disable all logging improves performance.

> 
> > - Disable power saving on the CPUs
> 
> All disabled as well, everything running on `performance` mode.
> 
> > 
> > Can you also share how the 90 OSDs are distributed in the cluster and what 
> > CPUs you have?
> 
> There are 45 machines with 2 OSDs each.   The servers they’re located on on 
> average have 24 core ~3GHz Intel CPUs.  Both OSDs are pinned to two cores on 
> the system.
> 

So 45 machines in total with 2 OSDs/SSDs each.

What is the network? 10GbE? What is the latency for a 8k packet? (ping -s 8192)

Also try running rados bench with more threads, 16 isn't that much. Try running 
with 128 or so from multiple clients.

Wido

> > 
> > Wido
> > 
> >> 
> >>>> 
> >>>> Thanks,
> >>>> Mohammed
> >>>> _______________________________________________
> >>>> ceph-users mailing list
> >>>> ceph-users@lists.ceph.com
> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> 
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] All SSD cluster performance

Reply via email to