Since I now have a qemu with RBD userspace support, lets add this to what is listed below.
Inside Wheeze VM, RBD userspace, standard cache enabled (in both ceph.conf and qemu): Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP tvm-02 8G 224015 34 57507 15 84728 13 2643 74 So basically twice as fast than the kernelspace variant below. Regards, Christian On Mon, 7 Apr 2014 19:01:30 +0900 Christian Balzer wrote: > > Hello, > > Nothing new, I know. But some numbers to mull and ultimately weep over. > > Ceph cluster based on Debian Jessie (thus ceph 0.72.x), 2 nodes, 2 OSDs > each. > Infiniband 4xQDR, IPoIB interconnects, 1 GByte/s bandwidth end to end. > There was nothing going on aside from the tests. > > Just going to use the bonnie++ values for throughput to keep it simple > and short. > > On the OSD itself: > --- > Version 1.97 ------Sequential Output------ --Sequential Input- > --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- > --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP > K/sec %CP K/sec %CP /sec %CP ceph-01 64G 1309731 96 > 467763 51 1703299 79 784.0 32 ---- > > On a compute node, host side: > --- > Version 1.97 ------Sequential Output------ --Sequential Input- > --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- > --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP > K/sec %CP K/sec %CP /sec %CP comp-02 256G 296928 60 > 64216 16 145015 17 291.6 12 --- > Ouch. Well the write speed is probably the OSD journal SSDs being hobbled > by being on SATA-2 links of the onboard AMD chipset. I had planned for > that shortcoming, alas the cheap and cheerful Marvell 88SE9230 based > PCIex4 controller can't get a stable link under any linux kernel I tried. > OTOH, I don't expect more than 30MB/s average writes for all the VMs > combined. > Despite having been aware of the sequential read speed issues, I really > was disappointed here. 10% of a single OSD. The OSD processes and actual > disks were bored stiff during the read portion of that bonnie run. > > OK, lets increase read-ahead (no or negative effects on the OSDs, FYI > since I've seen that mentioned a few times as well. > So after a "echo 4096 > /sys/block/vda/queue/read_ahead_kb" we get: > --- > Version 1.97 ------Sequential Output------ --Sequential Input- > --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- > --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP > K/sec %CP K/sec %CP /sec %CP comp-02 256G 280277 44 > 158633 30 655827 46 577.9 17 --- > Better, not great, but certainly around what I expected. > > So lets see how this is inside a VM (Wheezy). This is ganeti on jessie, > thus no qemu caching and kernelspace RBD (no qemu with userspace support > outside sid/experimental yet): > --- > Version 1.96 ------Sequential Output------ --Sequential Input- > --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- > --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP > K/sec %CP K/sec %CP /sec %CP fp-001 8G 170374 29 > 27599 7 34059 5 328.0 12 --- > Le mega ouch. So writes are down to 10% of the OSD and the reads are... > deplorable. > Setting the read-ahead inside the VM to 4MB gives us about 380MB/s reads, > so in line with the writes, that is half of the host speed. > I will test this with userspace qemu when available. > > However setting the read-ahead may not be a feasible option, be it access > to the VM, it being upgraded, etc. > Something more transparent that can be controlled by the people running > the host or ceph cluster is definitely needed: > https://wiki.ceph.com/Planning/Blueprints/Emperor/Kernel_client_read_ahead_optimization > > Regards, > > Christian -- Christian Balzer Network/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com