These Intel SSDs are more than capable of handling the workload, in addition, this cluster is used as an RBD backend for an OpenStack cluster.
Sent from my iPhone > On Jan 13, 2017, at 1:08 PM, Somnath Roy <somnath....@sandisk.com> wrote: > > Also, there are lot of discussion about SSDs not suitable for Ceph write > workload (with filestore) in community as those are not good for > odirect/odsync kind of writes. Hope your SSDs are tolerant of that. > > -----Original Message----- > From: Somnath Roy > Sent: Friday, January 13, 2017 10:06 AM > To: 'Mohammed Naser'; Wido den Hollander > Cc: ceph-users@lists.ceph.com > Subject: RE: [ceph-users] All SSD cluster performance > > << Both OSDs are pinned to two cores on the system Is there any reason you > are pinning osds like that ? I would say for object workload there is no need > to pin osds. > The configuration you mentioned , Ceph with 4M object PUT it should be > saturating your network first. > > Have you run say 4M object GET to see what BW you are getting ? > > Thanks & Regards > Somnath > > -----Original Message----- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Mohammed Naser > Sent: Friday, January 13, 2017 9:51 AM > To: Wido den Hollander > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] All SSD cluster performance > > >> On Jan 13, 2017, at 12:41 PM, Wido den Hollander <w...@42on.com> wrote: >> >> >>> Op 13 januari 2017 om 18:39 schreef Mohammed Naser <mna...@vexxhost.com>: >>> >>> >>> >>>> On Jan 13, 2017, at 12:37 PM, Wido den Hollander <w...@42on.com> wrote: >>>> >>>> >>>>> Op 13 januari 2017 om 18:18 schreef Mohammed Naser <mna...@vexxhost.com>: >>>>> >>>>> >>>>> Hi everyone, >>>>> >>>>> We have a deployment with 90 OSDs at the moment which is all SSD that’s >>>>> not hitting quite the performance that it should be in my opinion, a >>>>> `rados bench` run gives something along these numbers: >>>>> >>>>> Maintaining 16 concurrent writes of 4194304 bytes to objects of >>>>> size 4194304 for up to 10 seconds or 0 objects Object prefix: >>>>> benchmark_data_bench.vexxhost._30340 >>>>> sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg >>>>> lat(s) >>>>> 0 0 0 0 0 0 - 0 >>>>> 1 16 158 142 568.513 568 0.0965336 0.0939971 >>>>> 2 16 287 271 542.191 516 0.0291494 0.107503 >>>>> 3 16 375 359 478.75 352 0.0892724 0.118463 >>>>> 4 16 477 461 461.042 408 0.0243493 0.126649 >>>>> 5 16 540 524 419.216 252 0.239123 0.132195 >>>>> 6 16 644 628 418.67 416 0.347606 0.146832 >>>>> 7 16 734 718 410.281 360 0.0534447 0.147413 >>>>> 8 16 811 795 397.487 308 0.0311927 0.15004 >>>>> 9 16 879 863 383.537 272 0.0894534 0.158513 >>>>> 10 16 980 964 385.578 404 0.0969865 0.162121 >>>>> 11 3 981 978 355.613 56 0.798949 0.171779 >>>>> Total time run: 11.063482 >>>>> Total writes made: 981 >>>>> Write size: 4194304 >>>>> Object size: 4194304 >>>>> Bandwidth (MB/sec): 354.68 >>>>> Stddev Bandwidth: 137.608 >>>>> Max bandwidth (MB/sec): 568 >>>>> Min bandwidth (MB/sec): 56 >>>>> Average IOPS: 88 >>>>> Stddev IOPS: 34 >>>>> Max IOPS: 142 >>>>> Min IOPS: 14 >>>>> Average Latency(s): 0.175273 >>>>> Stddev Latency(s): 0.294736 >>>>> Max latency(s): 1.97781 >>>>> Min latency(s): 0.0205769 >>>>> Cleaning up (deleting benchmark objects) Clean up completed and >>>>> total clean up time :3.895293 >>>>> >>>>> We’ve verified the network by running `iperf` across both replication and >>>>> public networks and it resulted in 9.8Gb/s (10G links for both). The >>>>> machine that’s running the benchmark doesn’t even saturate it’s port. >>>>> The SSDs are S3520 960GB drives which we’ve benchmarked and they can >>>>> handle the load using fio/etc. At this point, not really sure where to >>>>> look next.. anyone running all SSD clusters that might be able to share >>>>> their experience? >>>> >>>> I suggest that you search a bit on the ceph-users list since this topic >>>> has been discussed multiple times in the past and even recently. >>>> >>>> Ceph isn't your average storage system and you have to keep that in mind. >>>> Nothing is free in this world. Ceph provides excellent consistency and >>>> distribution of data, but that also means that you have things like >>>> network and CPU latency. >>>> >>>> However, I suggest you look up a few threads on this list which have >>>> valuable tips. >>>> >>>> Wido >>> >>> Thanks for the reply, I’ve actually done quite a lot of research and went >>> through many of the previous posts. While I agree a 100% with your >>> statement, I’ve found that other people with similar setups have been able >>> to reach numbers that I cannot, which leads me to believe that there is >>> actually an issue in here. They have been able to max out at 1200 MB/s >>> which is the maximum of their benchmarking host. We’d like to reach that >>> and I think that given the specifications of the cluster, it can do it with >>> no problems. >> >> A few tips: >> >> - Disable all logging in Ceph (debug_osd, debug_ms, debug_auth, etc, >> etc) > > All logging is configured to default settings, should those be turned down? > >> - Disable power saving on the CPUs > > All disabled as well, everything running on `performance` mode. > >> >> Can you also share how the 90 OSDs are distributed in the cluster and what >> CPUs you have? > > There are 45 machines with 2 OSDs each. The servers they’re located on on > average have 24 core ~3GHz Intel CPUs. Both OSDs are pinned to two cores on > the system. > >> >> Wido >> >>> >>>>> >>>>> Thanks, >>>>> Mohammed >>>>> _______________________________________________ >>>>> ceph-users mailing list >>>>> ceph-users@lists.ceph.com >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ________________________________ > > PLEASE NOTE: The information contained in this electronic mail message is > intended only for the use of the designated recipient(s) named above. If the > reader of this message is not the intended recipient, you are hereby notified > that you have received this message in error and that any review, > dissemination, distribution, or copying of this message is strictly > prohibited. If you have received this communication in error, please notify > the sender by telephone or e-mail (as shown above) immediately and destroy > any and all copies of this message in your possession (whether hard copies or > electronically stored copies). > _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com