I've just worked out I had the same issue, been trying to work out the cause for the past few days!
However I am using brand new enterprise Toshiba drivers with 256MB write cache, was seeing I/O wait peaks of 40% even during a small writing operation to CEPH and commit / apply latency's in the 40ms+. Just went through and disabled the write cache on each drive, and done a few tests with the exact same write performance, but I/O wait in the <1% and commit / apply latency's in the 1-3ms max. Something somewhere definitely doesn't seem to like the write cache being enabled on the disks, this is a EC Pool in the latest Mimic version. On Sun, Nov 11, 2018 at 5:34 AM Vitaliy Filippov <vita...@yourcmc.ru> wrote: > Hi > > A weird thing happens in my test cluster made from desktop hardware. > > The command `for i in /dev/sd?; do hdparm -W 0 $i; done` increases > single-thread write iops (reduces latency) 7 times! > > It is a 3-node cluster with Ryzen 2700 CPUs, 3x SATA 7200rpm HDDs + 1x > SATA desktop SSD for system and ceph-mon + 1x SATA server SSD for > block.db/wal in each host. Hosts are linked by 10gbit ethernet (not the > fastest one though, average RTT according to flood-ping is 0.098ms). Ceph > and OpenNebula are installed on the same hosts, OSDs are prepared with > ceph-volume and bluestore with default options. SSDs have capacitors > ('power-loss protection'), write cache is turned off for them since the > very beginning (hdparm -W 0 /dev/sdb). They're quite old, but each of > them > is capable of delivering ~22000 iops in journal mode (fio -sync=1 > -direct=1 -iodepth=1 -bs=4k -rw=write). > > However, RBD single-threaded random-write benchmark originally gave awful > results - when testing with `fio -ioengine=libaio -size=10G -sync=1 > -direct=1 -name=test -bs=4k -iodepth=1 -rw=randwrite -runtime=60 > -filename=./testfile` from inside a VM, the result was only 58 iops > average (17ms latency). This was not what I expected from the HDD+SSD > setup. > > But today I tried to play with cache settings for data disks. And I was > really surprised to discover that just disabling HDD write cache (hdparm > -W 0 /dev/sdX for all HDD devices) increases single-threaded performance > ~7 times! The result from the same VM (without even rebooting it) is > iops=405, avg lat=2.47ms. That's a magnitude faster and in fact 2.5ms > seems sort of an expected number. > > As I understand 4k writes are always deferred at the default setting of > prefer_deferred_size_hdd=32768, this means they should only get written > to > the journal device before OSD acks the write operation. > > So my question is WHY? Why does HDD write cache affect commit latency > with > WAL on an SSD? > > I would also appreciate if anybody with similar setup (HDD+SSD with > desktop SATA controllers or HBA) could test the same thing... > > -- > With best regards, > Vitaliy Filippov > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com