Paul Brook wrote: > > > Paul Brook wrote: > > > > cache=none: > > > > No host caching. Reads and writes both go directly to underlying > > > > storage. > > > > > > > > Useful to avoid double-caching. > > > > > > > > cache=writethrough > > > > > > > > Reads are cached. Writes go directly to underlying storage. Useful > > > > for > > > > > > > > broken guests that aren't aware of drive caches. > > > > > > These are misleading descriptions - because cache=none does not push > > > writes down to powerfail-safe storage, while cache=writethrough might. > > > > If so, then this is a serious bug. > > .. though it may be a kernel bug rather that a qemu bug, depending on the > exact details.
It's not a kernel bug. cache=none uses O_DIRECT, and O_DIRECT must not force writes to powerfail-safe storage. If it did, it would be unusably slow for applications using O_DIRECT as a performance enhancer / memory saver. They can call fsync/fdatasync when they need to for integrity. (There might be kernel bugs in the latter department.) > Either way, I consider any mode that inhibits host filesystem write > cache but not volatile drive cache to be pretty worthless. On the contrary, it greatly reduces host memory consumption so that guest data isn't cached twice (it's already cached in the guest), and it may improve performance by relaxing the POSIX write-serialisation constraint (not sure if Linux cares; Solaris does). > Either we guaranteed data integrity on completion or we don't. The problem with the description of cache=none is it uses O_DIRECT, which does always not push writes to powerfail-safe storage,. O_DIRECT is effectively a hint. It requests less caching in kernel memory, may reduce memory usage and copying, may invoke direct DMA. O_DIRECT does not tell the disk hardware to commit to powerfail-safe storage. I.e. it doesn't issue barriers or disable disk write caching. (However, depending on a host setup, it might have that effect if disk write cache is disabled by the admin). Also, it doesn't even always write to disk: It falls back to buffered in some circumstances, even on filesystems which support it - see recent patches for btrfs which use buffered I/O for O_DIRECT for some parts of some files. (Many non-Linux OSes fall back to buffered when any other process holds a non-O_DIRECT file descriptor, or when requests don't meet some criteria). The POSIX thing to use for cache=none would be O_DSYNC|O_RSYNC, and that should work on some hosts, but Linux doesn't implement real O_RSYNC. A combination which ought to work is O_DSYNC|O_DIRECT. O_DIRECT is the performance hint; O_DSYNC provides the commit request. Christoph Hellwig has mentioned that combination elsewhere on this thread. It makes sense to me for cache=none. O_DIRECT by itself is a useful performance & memory hint, so there does need to be some option which maps onto O_DIRECT alone. But it shouldn't be documented as stronger than cache=writethrough, because it isn't. -- Jamie