> Op 13 januari 2017 om 18:50 schreef Mohammed Naser <mna...@vexxhost.com>: > > > > > On Jan 13, 2017, at 12:41 PM, Wido den Hollander <w...@42on.com> wrote: > > > > > >> Op 13 januari 2017 om 18:39 schreef Mohammed Naser <mna...@vexxhost.com>: > >> > >> > >> > >>> On Jan 13, 2017, at 12:37 PM, Wido den Hollander <w...@42on.com> wrote: > >>> > >>> > >>>> Op 13 januari 2017 om 18:18 schreef Mohammed Naser <mna...@vexxhost.com>: > >>>> > >>>> > >>>> Hi everyone, > >>>> > >>>> We have a deployment with 90 OSDs at the moment which is all SSD that’s > >>>> not hitting quite the performance that it should be in my opinion, a > >>>> `rados bench` run gives something along these numbers: > >>>> > >>>> Maintaining 16 concurrent writes of 4194304 bytes to objects of size > >>>> 4194304 for up to 10 seconds or 0 objects > >>>> Object prefix: benchmark_data_bench.vexxhost._30340 > >>>> sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg > >>>> lat(s) > >>>> 0 0 0 0 0 0 - > >>>> 0 > >>>> 1 16 158 142 568.513 568 0.0965336 > >>>> 0.0939971 > >>>> 2 16 287 271 542.191 516 0.0291494 > >>>> 0.107503 > >>>> 3 16 375 359 478.75 352 0.0892724 > >>>> 0.118463 > >>>> 4 16 477 461 461.042 408 0.0243493 > >>>> 0.126649 > >>>> 5 16 540 524 419.216 252 0.239123 > >>>> 0.132195 > >>>> 6 16 644 628 418.67 416 0.347606 > >>>> 0.146832 > >>>> 7 16 734 718 410.281 360 0.0534447 > >>>> 0.147413 > >>>> 8 16 811 795 397.487 308 0.0311927 > >>>> 0.15004 > >>>> 9 16 879 863 383.537 272 0.0894534 > >>>> 0.158513 > >>>> 10 16 980 964 385.578 404 0.0969865 > >>>> 0.162121 > >>>> 11 3 981 978 355.613 56 0.798949 > >>>> 0.171779 > >>>> Total time run: 11.063482 > >>>> Total writes made: 981 > >>>> Write size: 4194304 > >>>> Object size: 4194304 > >>>> Bandwidth (MB/sec): 354.68 > >>>> Stddev Bandwidth: 137.608 > >>>> Max bandwidth (MB/sec): 568 > >>>> Min bandwidth (MB/sec): 56 > >>>> Average IOPS: 88 > >>>> Stddev IOPS: 34 > >>>> Max IOPS: 142 > >>>> Min IOPS: 14 > >>>> Average Latency(s): 0.175273 > >>>> Stddev Latency(s): 0.294736 > >>>> Max latency(s): 1.97781 > >>>> Min latency(s): 0.0205769 > >>>> Cleaning up (deleting benchmark objects) > >>>> Clean up completed and total clean up time :3.895293 > >>>> > >>>> We’ve verified the network by running `iperf` across both replication > >>>> and public networks and it resulted in 9.8Gb/s (10G links for both). > >>>> The machine that’s running the benchmark doesn’t even saturate it’s > >>>> port. The SSDs are S3520 960GB drives which we’ve benchmarked and they > >>>> can handle the load using fio/etc. At this point, not really sure where > >>>> to look next.. anyone running all SSD clusters that might be able to > >>>> share their experience? > >>> > >>> I suggest that you search a bit on the ceph-users list since this topic > >>> has been discussed multiple times in the past and even recently. > >>> > >>> Ceph isn't your average storage system and you have to keep that in mind. > >>> Nothing is free in this world. Ceph provides excellent consistency and > >>> distribution of data, but that also means that you have things like > >>> network and CPU latency. > >>> > >>> However, I suggest you look up a few threads on this list which have > >>> valuable tips. > >>> > >>> Wido > >> > >> Thanks for the reply, I’ve actually done quite a lot of research and went > >> through many of the previous posts. While I agree a 100% with your > >> statement, I’ve found that other people with similar setups have been able > >> to reach numbers that I cannot, which leads me to believe that there is > >> actually an issue in here. They have been able to max out at 1200 MB/s > >> which is the maximum of their benchmarking host. We’d like to reach that > >> and I think that given the specifications of the cluster, it can do it > >> with no problems. > > > > A few tips: > > > > - Disable all logging in Ceph (debug_osd, debug_ms, debug_auth, etc, etc) > > All logging is configured to default settings, should those be turned down?
Yes, disable all logging improves performance. > > > - Disable power saving on the CPUs > > All disabled as well, everything running on `performance` mode. > > > > > Can you also share how the 90 OSDs are distributed in the cluster and what > > CPUs you have? > > There are 45 machines with 2 OSDs each. The servers they’re located on on > average have 24 core ~3GHz Intel CPUs. Both OSDs are pinned to two cores on > the system. > So 45 machines in total with 2 OSDs/SSDs each. What is the network? 10GbE? What is the latency for a 8k packet? (ping -s 8192) Also try running rados bench with more threads, 16 isn't that much. Try running with 128 or so from multiple clients. Wido > > > > Wido > > > >> > >>>> > >>>> Thanks, > >>>> Mohammed > >>>> _______________________________________________ > >>>> ceph-users mailing list > >>>> ceph-users@lists.ceph.com > >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com