> Op 13 januari 2017 om 18:39 schreef Mohammed Naser <mna...@vexxhost.com>: > > > > > On Jan 13, 2017, at 12:37 PM, Wido den Hollander <w...@42on.com> wrote: > > > > > >> Op 13 januari 2017 om 18:18 schreef Mohammed Naser <mna...@vexxhost.com>: > >> > >> > >> Hi everyone, > >> > >> We have a deployment with 90 OSDs at the moment which is all SSD that’s > >> not hitting quite the performance that it should be in my opinion, a > >> `rados bench` run gives something along these numbers: > >> > >> Maintaining 16 concurrent writes of 4194304 bytes to objects of size > >> 4194304 for up to 10 seconds or 0 objects > >> Object prefix: benchmark_data_bench.vexxhost._30340 > >> sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg > >> lat(s) > >> 0 0 0 0 0 0 - > >> 0 > >> 1 16 158 142 568.513 568 0.0965336 > >> 0.0939971 > >> 2 16 287 271 542.191 516 0.0291494 > >> 0.107503 > >> 3 16 375 359 478.75 352 0.0892724 > >> 0.118463 > >> 4 16 477 461 461.042 408 0.0243493 > >> 0.126649 > >> 5 16 540 524 419.216 252 0.239123 > >> 0.132195 > >> 6 16 644 628 418.67 416 0.347606 > >> 0.146832 > >> 7 16 734 718 410.281 360 0.0534447 > >> 0.147413 > >> 8 16 811 795 397.487 308 0.0311927 > >> 0.15004 > >> 9 16 879 863 383.537 272 0.0894534 > >> 0.158513 > >> 10 16 980 964 385.578 404 0.0969865 > >> 0.162121 > >> 11 3 981 978 355.613 56 0.798949 > >> 0.171779 > >> Total time run: 11.063482 > >> Total writes made: 981 > >> Write size: 4194304 > >> Object size: 4194304 > >> Bandwidth (MB/sec): 354.68 > >> Stddev Bandwidth: 137.608 > >> Max bandwidth (MB/sec): 568 > >> Min bandwidth (MB/sec): 56 > >> Average IOPS: 88 > >> Stddev IOPS: 34 > >> Max IOPS: 142 > >> Min IOPS: 14 > >> Average Latency(s): 0.175273 > >> Stddev Latency(s): 0.294736 > >> Max latency(s): 1.97781 > >> Min latency(s): 0.0205769 > >> Cleaning up (deleting benchmark objects) > >> Clean up completed and total clean up time :3.895293 > >> > >> We’ve verified the network by running `iperf` across both replication and > >> public networks and it resulted in 9.8Gb/s (10G links for both). The > >> machine that’s running the benchmark doesn’t even saturate it’s port. The > >> SSDs are S3520 960GB drives which we’ve benchmarked and they can handle > >> the load using fio/etc. At this point, not really sure where to look > >> next.. anyone running all SSD clusters that might be able to share their > >> experience? > > > > I suggest that you search a bit on the ceph-users list since this topic has > > been discussed multiple times in the past and even recently. > > > > Ceph isn't your average storage system and you have to keep that in mind. > > Nothing is free in this world. Ceph provides excellent consistency and > > distribution of data, but that also means that you have things like network > > and CPU latency. > > > > However, I suggest you look up a few threads on this list which have > > valuable tips. > > > > Wido > > Thanks for the reply, I’ve actually done quite a lot of research and went > through many of the previous posts. While I agree a 100% with your > statement, I’ve found that other people with similar setups have been able to > reach numbers that I cannot, which leads me to believe that there is actually > an issue in here. They have been able to max out at 1200 MB/s which is the > maximum of their benchmarking host. We’d like to reach that and I think that > given the specifications of the cluster, it can do it with no problems.
A few tips: - Disable all logging in Ceph (debug_osd, debug_ms, debug_auth, etc, etc) - Disable power saving on the CPUs Can you also share how the 90 OSDs are distributed in the cluster and what CPUs you have? Wido > > >> > >> Thanks, > >> Mohammed > >> _______________________________________________ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com