Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive

Jamie Lokier Wed, 12 May 2010 03:10:22 -0700

Paul Brook wrote:
> > > Paul Brook wrote:
> > > > cache=none:
> > > >   No host caching. Reads and writes both go directly to underlying
> > > >   storage.
> > > > 
> > > > Useful to avoid double-caching.
> > > > 
> > > > cache=writethrough
> > > > 
> > > >   Reads are cached. Writes go directly to underlying storage.  Useful
> > > >   for
> > > > 
> > > > broken guests that aren't aware of drive caches.
> > > 
> > > These are misleading descriptions - because cache=none does not push
> > > writes down to powerfail-safe storage, while cache=writethrough might.
> > 
> > If so, then this is a serious bug.
> 
> .. though it may be a kernel bug rather that a qemu bug, depending on the 
> exact details.


It's not a kernel bug.  cache=none uses O_DIRECT, and O_DIRECT must
not force writes to powerfail-safe storage.  If it did, it would be
unusably slow for applications using O_DIRECT as a performance
enhancer / memory saver.  They can call fsync/fdatasync when they need
to for integrity.  (There might be kernel bugs in the latter department.)

> Either way, I consider any mode that inhibits host filesystem write
> cache but not volatile drive cache to be pretty worthless.

On the contrary, it greatly reduces host memory consumption so that
guest data isn't cached twice (it's already cached in the guest), and
it may improve performance by relaxing the POSIX write-serialisation
constraint (not sure if Linux cares; Solaris does).

> Either we guaranteed data integrity on completion or we don't.

The problem with the description of cache=none is it uses O_DIRECT,
which does always not push writes to powerfail-safe storage,.

O_DIRECT is effectively a hint.  It requests less caching in kernel
memory, may reduce memory usage and copying, may invoke direct DMA.

O_DIRECT does not tell the disk hardware to commit to powerfail-safe
storage.  I.e. it doesn't issue barriers or disable disk write caching.
(However, depending on a host setup, it might have that effect if disk
write cache is disabled by the admin).

Also, it doesn't even always write to disk: It falls back to buffered
in some circumstances, even on filesystems which support it - see
recent patches for btrfs which use buffered I/O for O_DIRECT for some
parts of some files.  (Many non-Linux OSes fall back to buffered
when any other process holds a non-O_DIRECT file descriptor, or when
requests don't meet some criteria).

The POSIX thing to use for cache=none would be O_DSYNC|O_RSYNC, and
that should work on some hosts, but Linux doesn't implement real O_RSYNC.

A combination which ought to work is O_DSYNC|O_DIRECT.  O_DIRECT is
the performance hint; O_DSYNC provides the commit request.  Christoph
Hellwig has mentioned that combination elsewhere on this thread.
It makes sense to me for cache=none.

O_DIRECT by itself is a useful performance & memory hint, so there
does need to be some option which maps onto O_DIRECT alone.  But it
shouldn't be documented as stronger than cache=writethrough, because
it isn't.

--  Jamie

Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive

Reply via email to