On Tue, Nov 21, 2017 at 10:46 AM, Christian Balzer <ch...@gol.com> wrote:
> On Tue, 21 Nov 2017 09:21:58 +0200 Rudi Ahlers wrote: > > > On Mon, Nov 20, 2017 at 2:36 PM, Christian Balzer <ch...@gol.com> wrote: > > > > > On Mon, 20 Nov 2017 14:02:30 +0200 Rudi Ahlers wrote: > > > > > > > We're planning on installing 12X Virtual Machines with some heavy > loads. > > > > > > > > the SSD drives are INTEL SSDSC2BA400G4 > > > > > > > Interesting, where did you find those? > > > Or did you have them lying around? > > > > > > I've been unable to get DC S3710 SSDs for nearly a year now. > > > > > > > In South Africa, one of our suppliers had some in stock. They're still > > fairly new, about 2 months old now. > > > > > Odd, oh well. > > > > > > > > The SATA drives are ST8000NM0055-1RM112 > > > > > > > Note that these (while fast) have an internal flash cache, limiting > them to > > > something like 0.2 DWPD. > > > Probably not an issue with the WAL/DB on the Intels, but something to > keep > > > in mind. > > > > > > > > > I don't quite understand what you want to say, please explain? > > > See the other mails in this thread after the one above. > In short, probably nothing to worry about. > > > > > > > > > Please explain your comment, "b) will find a lot of people here who > don't > > > > approve of it." > > > > > > > Read the archives. > > > Converged clusters are complex and debugging Ceph when tons of other > > > things are going on at the same time on the machine even more so. > > > > > > > > > Ok, so I have 4 physical servers and need to setup a highly redundant > > cluster. How else would you have done it? There is no budget for a SAN, > let > > alone a highly available SAN. > > > As I said, I'd be fine doing it with Ceph, if that was a good match. > It's easy to starve resources with hyperconverged clusters. > > Since you're using proxmox, DRBD would be an obvious alternative, > especially if you're not planning on growing this cluster. > > You only mentioned 3 servers so far, is the fourth one non-Ceph? > >From what I have read, DRBD isn't very stable? The 4th one will be for backups. > > > > > > > > > > > > I don't have access to the switches right now, but they're new so > > > whatever > > > > default config ships from factory would be active. Though iperf shows > > > 10.5 > > > > GBytes / 9.02 Gbits/sec throughput. > > > > > > > Didn't think it was the switches, but completeness sake and all that. > > > > > > > What speeds would you expect? > > > > "Though with your setup I would have expected something faster, but > NOT > > > the > > > > theoretical 600MB/s 4 HDDs will do in sequential writes." > > > > > > > What I wrote. > > > A 7200RPM HDD, even these, can not sustain writes much over 170MB/s, in > > > the most optimal circumstances. > > > So your cluster can NOT exceed about 600MB/s sustained writes with the > > > effective bandwidth of 4 HDDs. > > > Smaller writes/reads that can be cached by RAM, DB, onboard caches on > the > > > HDDs of course can and will be faster. > > > > > > But again, you're missing the point, even if you get 600MB/s writes > out of > > > your cluster, the number of 4k IOPS will be much more relevant to your > VMs. > > > > > > > > hdparm shows about 230MB/s: > > > > ^Croot@virt2:~# hdparm -Tt /dev/sda > > > > /dev/sda: > > Timing cached reads: 20250 MB in 2.00 seconds = 10134.81 MB/sec > > Timing buffered disk reads: 680 MB in 3.00 seconds = 226.50 MB/sec > > > That's read and a very optimized sequential one at that. > > > > > > 600MB/s would be super nice, but in reality even 400MB/s would be nice. > Do you really need to write that amount of data in a short time? > Typical VMs are IOPS bound, as pointed out several times. > We have 10x physical servers which are quite busy and two of them are slow in terms of disk speed so I am looking at getting better performance. > > > Would it not be achievable? > > > Maybe, but you need to find out what, if anything makes your cluster > slower than this. > iostat, atop, etc can help with that. > How busy are your CPUs, HDDs and SSDs when you run that benchmark? > The CPU and RAM is fairly "idle" during any of my tests. > > > > > > > > > > > > > > > > > On this, "If an OSD has no fast WAL/DB, it will drag the overall > speed > > > > down. Verify and if so fix this and re-test.": how? > > > > > > > No idea, I don't do bluestore. > > > You noticed the lack of a WAL/DB for sda, go and fix it. > > > If in in doubt by destroying and re-creating. > > > > > > And if you're looking for a less invasive procedure, docs and the ML > > > archive, but AFAIK there is nothing but re-creation at this time. > > > > > > > > > Since I use Proxmox, which setup a DB device, but not a WAL device. > > > Again, I don't do bluestore. > But AFAIK, WAL will live on the fastest device, which is the SSD you've > put the DB on, unless specified separately. > So nothing to be done here. > I have re-created the CEPH pool with a DB and WAL device this time and performance is slightly better: root@virt2:~# ceph-disk list | grep /dev/sdf | grep osd /dev/sdb1 ceph data, active, cluster ceph, osd.5, block /dev/sdb2, block.db /dev/sdf1, block.wal /dev/sdf2 /dev/sdd1 ceph data, active, cluster ceph, osd.7, block /dev/sdd2, block.db /dev/sdf3, block.wal /dev/sdf4 root@virt2:~# ceph-disk list | grep /dev/sde | grep osd /dev/sda1 ceph data, active, cluster ceph, osd.4, block /dev/sda2, block.db /dev/sde1, block.wal /dev/sde2 /dev/sdc1 ceph data, active, cluster ceph, osd.6, block /dev/sdc2, block.db /dev/sde3, block.wal /dev/sde4 root@virt2:~# rados bench -p Data 10 seq hints = 1 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 0 0 0 0 0 0 - 0 1 16 311 295 1179.73 1180 0.0498938 0.0520793 2 16 622 606 1211.78 1244 0.0358 0.0511329 3 16 934 918 1223.8 1248 0.0587524 0.0506744 Total time run: 3.420127 Total reads made: 986 Read size: 4194304 Object size: 4194304 Bandwidth (MB/sec): 1153.17 Average IOPS: 288 Stddev IOPS: 9 Max IOPS: 312 Min IOPS: 295 Average Latency(s): 0.053413 Max latency(s): 0.284069 Min latency(s): 0.0166523 root@virt2:~# rados bench -p Data 10 rand hints = 1 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 0 0 0 0 0 0 - 0 1 16 381 365 1459.69 1460 0.00267135 0.04159 2 15 715 700 1399.75 1340 0.0934119 0.0441607 3 15 1079 1064 1418.44 1456 0.00258879 0.0435526 4 16 1448 1432 1431.77 1472 0.134513 0.0435446 5 16 1862 1846 1476.56 1656 0.017519 0.042301 6 16 2192 2176 1450.44 1320 0.00885603 0.0427858 7 16 2558 2542 1452.35 1464 0.00184139 0.0429065 8 16 2996 2980 1489.78 1752 0.0103593 0.04178 9 16 3385 3369 1497.12 1556 0.00866541 0.041612 10 16 3744 3728 1490.99 1436 0.00246718 0.0420014 Total time run: 10.204271 Total reads made: 3744 Read size: 4194304 Object size: 4194304 Bandwidth (MB/sec): 1467.62 Average IOPS: 366 Stddev IOPS: 33 Max IOPS: 438 Min IOPS: 330 Average Latency(s): 0.0427017 Max latency(s): 0.453643 Min latency(s): 0.00143035 root@virt2:~# rados bench -p Data 10 write --no-cleanup hints = 1 Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10 seconds or 0 objects Object prefix: benchmark_data_virt2_20816 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 0 0 0 0 0 0 - 0 1 16 106 90 359.981 360 0.211947 0.164055 2 16 202 186 371.956 384 0.101829 0.161727 3 16 312 296 394.616 440 0.142682 0.157926 4 16 414 398 397.946 408 0.17893 0.157207 5 16 515 499 399.147 404 0.138521 0.157384 6 16 609 593 395.281 376 0.197496 0.159185 7 16 703 687 392.521 376 0.148057 0.160965 8 16 796 780 389.952 372 0.360846 0.161464 9 16 907 891 395.951 444 0.0697599 0.160687 10 16 989 973 389.153 328 0.164584 0.161334 Total time run: 10.125151 Total writes made: 990 Write size: 4194304 Object size: 4194304 Bandwidth (MB/sec): 391.105 Stddev Bandwidth: 35.6302 Max bandwidth (MB/sec): 444 Min bandwidth (MB/sec): 328 Average IOPS: 97 Stddev IOPS: 8 Max IOPS: 111 Min IOPS: 82 Average Latency(s): 0.163488 Stddev Latency(s): 0.0623322 Max latency(s): 0.451163 Min latency(s): 0.0416428 As noted the IOPS is still very very low. What could cause that? -- Kind Regards Rudi Ahlers Website: http://www.rudiahlers.co.za
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com