Hi! I have some grave generic disk latency issues on the servers I virtualized under Linux/Qemu/KVM + virtio_blk. Virtual block devices are setup to run raw on LVM logical volumes with cache=none. Whenever some writes happen (according to iotop it's only a few kB every few seconds) disk latency in VMs goes up to 1.5 seconds. I understand that cache=writeback may help but I am unable to find any details about whether it is safe to use it in the way I use LVs:
Backups are currently being created by calling sync inside the guest OS and snapshotting the LVs immediately afterwards on the host. The host then mounts those snapshots (usually causing journal replay on them) and starts saving data from it. Snapshots are discarded afterwards. Another side-effect of high disk latency might be that all VM hosts have issues releasing those LVM snapshots afterwards (they tend to need a few seconds or minutes before they can be deactivated and removed). I noticed that a server with hardware RAID + BBU and 256MB write cache does not suffer from these issues. However, that's far from what I can afford on all other servers, apart from my personal distaste for hardware RAIDs. The behavior on regular block devices/software RAIDs is far from ideal, but I'm still hesitant to try cache=writeback because I don't have any answer to the following questions: - Is data from Qemu write caches being written in order or out of order? If data is getting reordered I am likely to have (more) data corruption when snapshotting for a backup or the server suffers a sudden power loss, kernel panic or other instability. - Is the host OS aware of that cache, i.e. do LVM snapshots include data that still resided in the Qemu write cache at the time it was triggered? - Is there any way to tell Qemu to flush the cache before I trigger the LV snapshot so I can get (a higher chance of) consistent data on the disk? - If LVM snapshots on host-side are unsafe with cache=writeback, what is the preferred way to run backups other than running LVM inside the guests and connecting via a network file system or accessing LVM from the host in LVM cluster configuration? - Is there any other way to improve disk latency? (I/O schedulers don't have any effect, CFQ on both sides works best despite the common recommendations to use deadline on host and noop on guests) I know that writeback caching itself is considered safe with current file systems and kernels so far as I can tolerate loss of data the amount of RAM allocated for write caching (BTW, is there any way to influence that size?). See: http://lists.gnu.org/archive/html/qemu-devel/2012-02/msg02682.html However, that doesn't tell anything about how LVM snapshots behave if they are run host-side. I tried to answer those questions myself by having a look into Qemu source code (after searching the web, asking in forums and experimenting for months) but I was unable to figure out where the cache is being defined. I only found a lot of flag handling. Maybe the cache isn't in Qemu but actually part of the Linux kernel? It would be great to get some definite answers to those questions. Currently, I have to assume the worst and expect writeback to cause totally corrupt data if the host triggers a snapshot which means that I would be unable to do backups via host-side LVM snapshots and without visiting the guest file systems via network. Thanks, Daniel
