[Qemu-discuss] data consistency on LVM snapshots using writeback caching

Daniel Neugebauer Sun, 14 Jul 2013 12:41:57 -0700

Hi!

I have some grave generic disk latency issues on the servers I
virtualized under Linux/Qemu/KVM + virtio_blk. Virtual block devices are
setup to run raw on LVM logical volumes with cache=none. Whenever some
writes happen (according to iotop it's only a few kB every few seconds)
disk latency in VMs goes up to 1.5 seconds. I understand that
cache=writeback may help but I am unable to find any details about
whether it is safe to use it in the way I use LVs:


Backups are currently being created by calling sync inside the guest OS
and snapshotting the LVs immediately afterwards on the host. The host
then mounts those snapshots (usually causing journal replay on them) and
starts saving data from it. Snapshots are discarded afterwards. Another
side-effect of high disk latency might be that all VM hosts have issues
releasing those LVM snapshots afterwards (they tend to need a few
seconds or minutes before they can be deactivated and removed).

I noticed that a server with hardware RAID + BBU and 256MB write cache
does not suffer from these issues. However, that's far from what I can
afford on all other servers, apart from my personal distaste for
hardware RAIDs.

The behavior on regular block devices/software RAIDs is far from ideal,
but I'm still hesitant to try cache=writeback because I don't have any
answer to the following questions:

 - Is data from Qemu write caches being written in order or out of
   order? If data is getting reordered I am likely to have (more) data
   corruption when snapshotting for a backup or the server suffers a
   sudden power loss, kernel panic or other instability.

 - Is the host OS aware of that cache, i.e. do LVM snapshots include
   data that still resided in the Qemu write cache at the time it was
   triggered?

 - Is there any way to tell Qemu to flush the cache before I trigger
   the LV snapshot so I can get (a higher chance of) consistent data on
   the disk?

 - If LVM snapshots on host-side are unsafe with cache=writeback, what
   is the preferred way to run backups other than running LVM inside the
   guests and connecting via a network file system or accessing LVM from
   the host in LVM cluster configuration?

 - Is there any other way to improve disk latency? (I/O schedulers
   don't have any effect, CFQ on both sides works best despite the
   common recommendations to use deadline on host and noop on guests)

I know that writeback caching itself is considered safe with current
file systems and kernels so far as I can tolerate loss of data the
amount of RAM allocated for write caching (BTW, is there any way to
influence that size?). See:
http://lists.gnu.org/archive/html/qemu-devel/2012-02/msg02682.html
However, that doesn't tell anything about how LVM snapshots behave if
they are run host-side.

I tried to answer those questions myself by having a look into Qemu
source code (after searching the web, asking in forums and experimenting
for months) but I was unable to figure out where the cache is being
defined. I only found a lot of flag handling. Maybe the cache isn't in
Qemu but actually part of the Linux kernel?

It would be great to get some definite answers to those questions.
Currently, I have to assume the worst and expect writeback to cause
totally corrupt data if the host triggers a snapshot which means that I
would be unable to do backups via host-side LVM snapshots and without
visiting the guest file systems via network.

Thanks,
Daniel

[Qemu-discuss] data consistency on LVM snapshots using writeback caching

Reply via email to