Re: [ceph-users] All SSD cluster performance

Maxime Guyot Sun, 15 Jan 2017 12:34:09 -0800

Hi,

I don’t have firsthand experience with the S3520, as Christian pointed out 
their endurance doesn’t make them suitable for OSDs in most cases. I can only 
advise you to keep a close eye on the SMART status of the SSDs.


Anyway, the S3520 960GB is advertised at 380 MB/s for write.
Assuming this cluster is with collocated journal and a replicated pool of size 
3, that would be a maximum theoretical throughput of 60MB/s per OSD, so 5.7GB/s 
theoretical maximum. IMO and for reasonably configured hosts, you can expect 
around 50% of theoretical maximum throughput for 4M I/O.

Maybe you want to share more info on your cluster and benchmark procedure?

Cheers,
Maxime

On 14/01/17 10:09, "ceph-users on behalf of Wido den Hollander" 
<ceph-users-boun...@lists.ceph.com on behalf of w...@42on.com> wrote:

    
    > Op 14 januari 2017 om 6:41 schreef Christian Balzer <ch...@gol.com>:
    > 
    > 
    > 
    > Hello,
    > 
    > On Fri, 13 Jan 2017 13:18:35 -0500 Mohammed Naser wrote:
    > 
    > > These Intel SSDs are more than capable of handling the workload, in 
addition, this cluster is used as an RBD backend for an OpenStack cluster. 
    > >
    > 
    > I haven't tested the S3520s yet, them being the first 3D NAND offering
    > from Intel they are slightly slower than the predecessors in terms of BW
    > and IOPS, but have supposedly a slightly lower latency if the specs are to
    > believed.
    > 
    > Given the history of Intel DC S SSDs I think it is safe to assume that 
they
    > use the same/similar controller setup as their predecessors, meaning a
    > large powercap backed cache which enables them to deal correctly and
    > quickly with SYNC and DIRECT writes. 
    > 
    > What would worry me slight more (even at their 960GB size) is the 
endurance
    > of 1 DWPD, which with journals inline comes down to 0.5 and with FS
    > overhead and write amplification (depends a lot on the type of operations)
    > you're looking a something along 0.3 DWPD to base your expectations on.
    > Mind, that still leaves you with about 9.6TB per day, which is a decent
    > enough number, but only equates to about 112MB/s.
    > 
    > Finally, most people start with looking at bandwidth/throughput when
    > penultimately they discover it was IOPS they needed first and foremost.
    
    Yes! Bandwidth isn't what people usually need, they need IOps. Low latency.
    
    I see a lot of clusters doing 10k ~ 20k IOps with somewhere around 1Gbit/s 
of traffic.
    
    Wido
    
    > 
    > Christian
    > 
    > > Sent from my iPhone
    > > 
    > > > On Jan 13, 2017, at 1:08 PM, Somnath Roy <somnath....@sandisk.com> 
wrote:
    > > > 
    > > > Also, there are lot of discussion about SSDs not suitable for Ceph 
write workload (with filestore) in community as those are not good for 
odirect/odsync kind of writes. Hope your SSDs are tolerant of that.
    > > > 
    > > > -----Original Message-----
    > > > From: Somnath Roy
    > > > Sent: Friday, January 13, 2017 10:06 AM
    > > > To: 'Mohammed Naser'; Wido den Hollander
    > > > Cc: ceph-users@lists.ceph.com
    > > > Subject: RE: [ceph-users] All SSD cluster performance
    > > > 
    > > > << Both OSDs are pinned to two cores on the system Is there any 
reason you are pinning osds like that ? I would say for object workload there 
is no need to pin osds.
    > > > The configuration you mentioned , Ceph with 4M object PUT it should 
be saturating your network first.
    > > > 
    > > > Have you run say 4M object GET to see what BW you are getting ?
    > > > 
    > > > Thanks & Regards
    > > > Somnath
    > > > 
    > > > -----Original Message-----
    > > > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf 
Of Mohammed Naser
    > > > Sent: Friday, January 13, 2017 9:51 AM
    > > > To: Wido den Hollander
    > > > Cc: ceph-users@lists.ceph.com
    > > > Subject: Re: [ceph-users] All SSD cluster performance
    > > > 
    > > > 
    > > >> On Jan 13, 2017, at 12:41 PM, Wido den Hollander <w...@42on.com> 
wrote:
    > > >> 
    > > >> 
    > > >>> Op 13 januari 2017 om 18:39 schreef Mohammed Naser 
<mna...@vexxhost.com>:
    > > >>> 
    > > >>> 
    > > >>> 
    > > >>>> On Jan 13, 2017, at 12:37 PM, Wido den Hollander <w...@42on.com> 
wrote:
    > > >>>> 
    > > >>>> 
    > > >>>>> Op 13 januari 2017 om 18:18 schreef Mohammed Naser 
<mna...@vexxhost.com>:
    > > >>>>> 
    > > >>>>> 
    > > >>>>> Hi everyone,
    > > >>>>> 
    > > >>>>> We have a deployment with 90 OSDs at the moment which is all SSD 
that’s not hitting quite the performance that it should be in my opinion, a 
`rados bench` run gives something along these numbers:
    > > >>>>> 
    > > >>>>> Maintaining 16 concurrent writes of 4194304 bytes to objects of
    > > >>>>> size 4194304 for up to 10 seconds or 0 objects Object prefix: 
benchmark_data_bench.vexxhost._30340
    > > >>>>> sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  
avg lat(s)
    > > >>>>>  0       0         0         0         0         0           -    
       0
    > > >>>>>  1      16       158       142   568.513       568   0.0965336   
0.0939971
    > > >>>>>  2      16       287       271   542.191       516   0.0291494    
0.107503
    > > >>>>>  3      16       375       359    478.75       352   0.0892724    
0.118463
    > > >>>>>  4      16       477       461   461.042       408   0.0243493    
0.126649
    > > >>>>>  5      16       540       524   419.216       252    0.239123    
0.132195
    > > >>>>>  6      16       644       628    418.67       416    0.347606    
0.146832
    > > >>>>>  7      16       734       718   410.281       360   0.0534447    
0.147413
    > > >>>>>  8      16       811       795   397.487       308   0.0311927    
 0.15004
    > > >>>>>  9      16       879       863   383.537       272   0.0894534    
0.158513
    > > >>>>> 10      16       980       964   385.578       404   0.0969865    
0.162121
    > > >>>>> 11       3       981       978   355.613        56    0.798949    
0.171779
    > > >>>>> Total time run:         11.063482
    > > >>>>> Total writes made:      981
    > > >>>>> Write size:             4194304
    > > >>>>> Object size:            4194304
    > > >>>>> Bandwidth (MB/sec):     354.68
    > > >>>>> Stddev Bandwidth:       137.608
    > > >>>>> Max bandwidth (MB/sec): 568
    > > >>>>> Min bandwidth (MB/sec): 56
    > > >>>>> Average IOPS:           88
    > > >>>>> Stddev IOPS:            34
    > > >>>>> Max IOPS:               142
    > > >>>>> Min IOPS:               14
    > > >>>>> Average Latency(s):     0.175273
    > > >>>>> Stddev Latency(s):      0.294736
    > > >>>>> Max latency(s):         1.97781
    > > >>>>> Min latency(s):         0.0205769
    > > >>>>> Cleaning up (deleting benchmark objects) Clean up completed and
    > > >>>>> total clean up time :3.895293
    > > >>>>> 
    > > >>>>> We’ve verified the network by running `iperf` across both 
replication and public networks and it resulted in 9.8Gb/s (10G links for 
both).  The machine that’s running the benchmark doesn’t even saturate it’s 
port.  The SSDs are S3520 960GB drives which we’ve benchmarked and they can 
handle the load using fio/etc.  At this point, not really sure where to look 
next.. anyone running all SSD clusters that might be able to share their 
experience?
    > > >>>> 
    > > >>>> I suggest that you search a bit on the ceph-users list since this 
topic has been discussed multiple times in the past and even recently.
    > > >>>> 
    > > >>>> Ceph isn't your average storage system and you have to keep that 
in mind. Nothing is free in this world. Ceph provides excellent consistency and 
distribution of data, but that also means that you have things like network and 
CPU latency.
    > > >>>> 
    > > >>>> However, I suggest you look up a few threads on this list which 
have valuable tips.
    > > >>>> 
    > > >>>> Wido
    > > >>> 
    > > >>> Thanks for the reply, I’ve actually done quite a lot of research 
and went through many of the previous posts.  While I agree a 100% with your 
statement, I’ve found that other people with similar setups have been able to 
reach numbers that I cannot, which leads me to believe that there is actually 
an issue in here.  They have been able to max out at 1200 MB/s which is the 
maximum of their benchmarking host.  We’d like to reach that and I think that 
given the specifications of the cluster, it can do it with no problems.
    > > >> 
    > > >> A few tips:
    > > >> 
    > > >> - Disable all logging in Ceph (debug_osd, debug_ms, debug_auth, etc,
    > > >> etc)
    > > > 
    > > > All logging is configured to default settings, should those be turned 
down?
    > > > 
    > > >> - Disable power saving on the CPUs
    > > > 
    > > > All disabled as well, everything running on `performance` mode.
    > > > 
    > > >> 
    > > >> Can you also share how the 90 OSDs are distributed in the cluster 
and what CPUs you have?
    > > > 
    > > > There are 45 machines with 2 OSDs each.   The servers they’re located 
on on average have 24 core ~3GHz Intel CPUs.  Both OSDs are pinned to two cores 
on the system.
    > > > 
    > > >> 
    > > >> Wido
    > > >> 
    > > >>> 
    > > >>>>> 
    > > >>>>> Thanks,
    > > >>>>> Mohammed
    > > >>>>> _______________________________________________
    > > >>>>> ceph-users mailing list
    > > >>>>> ceph-users@lists.ceph.com
    > > >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
    > > >>> 
    > > > 
    > > > _______________________________________________
    > > > ceph-users mailing list
    > > > ceph-users@lists.ceph.com
    > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
    > > > 
    > > > ________________________________
    > > > 
    > > > PLEASE NOTE: The information contained in this electronic mail 
message is intended only for the use of the designated recipient(s) named 
above. If the reader of this message is not the intended recipient, you are 
hereby notified that you have received this message in error and that any 
review, dissemination, distribution, or copying of this message is strictly 
prohibited. If you have received this communication in error, please notify the 
sender by telephone or e-mail (as shown above) immediately and destroy any and 
all copies of this message in your possession (whether hard copies or 
electronically stored copies).
    > > > 
    > > _______________________________________________
    > > ceph-users mailing list
    > > ceph-users@lists.ceph.com
    > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
    > 
    > 
    > -- 
    > Christian Balzer        Network/Systems Engineer                
    > ch...@gol.com     Global OnLine Japan/Rakuten Communications
    > http://www.gol.com/
    > _______________________________________________
    > ceph-users mailing list
    > ceph-users@lists.ceph.com
    > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
    _______________________________________________
    ceph-users mailing list
    ceph-users@lists.ceph.com
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
    

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] All SSD cluster performance

Reply via email to