I did your rados bench test on our sm863a pool 3x rep, got similar
results.
[@]# rados bench -p fs_data.ssd -b 4096 10 write --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4096 bytes to objects of size 4096
for up to 10 seconds or 0 objects
Object prefix: benchmark_data_c04_1337712
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg
lat(s)
0 0 0 0 0 0 -
0
1 16 6302 6286 24.5533 24.5547 0.00304773
0.002541
2 15 12545 12530 24.4705 24.3906 0.00228294
0.0025506
3 16 18675 18659 24.2933 23.9414 0.00332918
0.00257042
4 16 25194 25178 24.5854 25.4648 0.0034176
0.00254016
5 16 31657 31641 24.7169 25.2461 0.00156494
0.00252686
6 16 37713 37697 24.5398 23.6562 0.00228134
0.00254527
7 16 43848 43832 24.4572 23.9648 0.00238393
0.00255401
8 16 49516 49500 24.1673 22.1406 0.00244473
0.00258466
9 16 55562 55546 24.1059 23.6172 0.00249619
0.00259139
10 16 61675 61659 24.0829 23.8789 0.0020192
0.00259362
Total time run: 10.002179
Total writes made: 61675
Write size: 4096
Object size: 4096
Bandwidth (MB/sec): 24.0865
Stddev Bandwidth: 0.932554
Max bandwidth (MB/sec): 25.4648
Min bandwidth (MB/sec): 22.1406
Average IOPS: 6166
Stddev IOPS: 238
Max IOPS: 6519
Min IOPS: 5668
Average Latency(s): 0.00259383
Stddev Latency(s): 0.00173856
Max latency(s): 0.0778051
Min latency(s): 0.00110931
[@ ]# rados bench -p fs_data.ssd 10 rand
hints = 1
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg
lat(s)
0 0 0 0 0 0 -
0
1 15 27697 27682 108.115 108.133 0.000755936
0.000568212
2 15 57975 57960 113.186 118.273 0.000547682
0.000542773
3 15 88500 88485 115.199 119.238 0.00036749
0.000533185
4 15 117199 117184 114.422 112.105 0.000354388
0.000536647
5 15 147734 147719 115.39 119.277 0.000419781
0.00053221
6 16 176393 176377 114.814 111.945 0.000427109
0.000534771
7 15 203693 203678 113.645 106.645 0.000379089
0.000540113
8 15 231917 231902 113.219 110.25 0.000465232
0.000542156
9 16 261054 261038 113.284 113.812 0.000358025
0.000541972
Total time run: 10.000669
Total reads made: 290371
Read size: 4096
Object size: 4096
Bandwidth (MB/sec): 113.419
Average IOPS: 29035
Stddev IOPS: 1212
Max IOPS: 30535
Min IOPS: 27301
Average Latency(s): 0.000541371
Max latency(s): 0.00380609
Min latency(s): 0.000155521
-----Original Message-----
From: [email protected] [mailto:[email protected]]
Sent: 07 February 2019 08:17
To: [email protected]
Subject: [ceph-users] rados block on SSD - performance - how to tune and
get insight?
Hi List
We are in the process of moving to the next usecase for our ceph cluster
(Bulk, cheap, slow, erasurecoded, cephfs) storage was the first - and
that works fine.
We're currently on luminous / bluestore, if upgrading is deemed to
change what we're seeing then please let us know.
We have 6 OSD hosts, each with a S4510 of 1TB with 1 SSD in each.
Connected through a H700 MegaRaid Perc BBWC, EachDiskRaid0 - and
scheduler set to deadline, nomerges = 1, rotational = 0.
Each disk "should" give approximately 36K IOPS random write and the
double random read.
Pool is setup with a 3x replicaiton. We would like a "scaleout" setup of
well performing SSD block devices - potentially to host databases and
things like that. I ready through this nice document [0], I know the HW
are radically different from mine, but I still think I'm in the very low
end of what 6 x S4510 should be capable of doing.
Since it is IOPS i care about I have lowered block size to 4096 -- 4M
blocksize nicely saturates the NIC's in both directions.
$ sudo rados bench -p scbench -b 4096 10 write --no-cleanup hints = 1
Maintaining 16 concurrent writes of 4096 bytes to objects of size 4096
for up to 10 seconds or 0 objects Object prefix:
benchmark_data_torsk2_11207
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg
lat(s)
0 0 0 0 0 0 -
0
1 16 5857 5841 22.8155 22.8164 0.00238437
0.00273434
2 15 11768 11753 22.9533 23.0938 0.0028559
0.00271944
3 16 17264 17248 22.4564 21.4648 0.00246666
0.00278101
4 16 22857 22841 22.3037 21.8477 0.002716
0.00280023
5 16 28462 28446 22.2213 21.8945 0.00220186
0.002811
6 16 34216 34200 22.2635 22.4766 0.00234315
0.00280552
7 16 39616 39600 22.0962 21.0938 0.00290661
0.00282718
8 16 45510 45494 22.2118 23.0234 0.0033541
0.00281253
9 16 50995 50979 22.1243 21.4258 0.00267282
0.00282371
10 16 56745 56729 22.1577 22.4609 0.00252583
0.0028193
Total time run: 10.002668
Total writes made: 56745
Write size: 4096
Object size: 4096
Bandwidth (MB/sec): 22.1601
Stddev Bandwidth: 0.712297
Max bandwidth (MB/sec): 23.0938
Min bandwidth (MB/sec): 21.0938
Average IOPS: 5672
Stddev IOPS: 182
Max IOPS: 5912
Min IOPS: 5400
Average Latency(s): 0.00281953
Stddev Latency(s): 0.00190771
Max latency(s): 0.0834767
Min latency(s): 0.00120945
Min latency is fine -- but Max latency of 83ms ?
Average IOPS @ 5672 ?
$ sudo rados bench -p scbench 10 rand
hints = 1
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg
lat(s)
0 0 0 0 0 0 -
0
1 15 23329 23314 91.0537 91.0703 0.000349856
0.000679074
2 16 48555 48539 94.7884 98.5352 0.000499159
0.000652067
3 16 76193 76177 99.1747 107.961 0.000443877
0.000622775
4 15 103923 103908 101.459 108.324 0.000678589
0.000609182
5 15 132720 132705 103.663 112.488 0.000741734
0.000595998
6 15 161811 161796 105.323 113.637 0.000333166
0.000586323
7 15 190196 190181 106.115 110.879 0.000612227
0.000582014
8 15 221155 221140 107.966 120.934 0.000471219
0.000571944
9 16 251143 251127 108.984 117.137 0.000267528
0.000566659
Total time run: 10.000640
Total reads made: 282097
Read size: 4096
Object size: 4096
Bandwidth (MB/sec): 110.187
Average IOPS: 28207
Stddev IOPS: 2357
Max IOPS: 30959
Min IOPS: 23314
Average Latency(s): 0.000560402
Max latency(s): 0.109804
Min latency(s): 0.000212671
This is also quite far from expected. I have 12GB of memory on the OSD
daemon for caching on each host - close to idle cluster - thus 50GB+ for
caching with a working set of < 6GB .. this should - in this case not
really be bound by the underlying SSD. But if it were:
IOPS/disk * num disks / replication => 95K * 6 / 3 => 190K or 6x off?
No measureable service time in iostat when running tests, thus I have
come to the conclusion that it has to be either client side, the network
path, or the OSD-daemon that deliveres the increasing latency /
decreased IOPS.
Is there any suggestions on how to get more insigths in that?
Has anyone replicated close to the number Micron are reporting on NVMe?
Thanks a log.
[0]
https://www.micron.com/-/media/client/global/documents/products/other-documents/micron_9200_max_ceph_12,-d-,2,-d-,8_luminous_bluestore_reference_architecture.pdf?la=en
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com