On Mon, 10 Oct 2011 05:54:56 +0100, Stefan Hajnoczi <stefa...@gmail.com> wrote: > On Sun, Oct 9, 2011 at 7:35 PM, Aneesh Kumar K.V > <aneesh.ku...@linux.vnet.ibm.com> wrote: > > On Sun, 9 Oct 2011 17:16:50 +0100, Stefan Hajnoczi <stefa...@gmail.com> > > wrote: > >> On Sun, Oct 9, 2011 at 4:34 PM, Aneesh Kumar K.V > >> <aneesh.ku...@linux.vnet.ibm.com> wrote: > >> > On Sat, 8 Oct 2011 12:24:37 +0100, Stefan Hajnoczi <stefa...@gmail.com> > >> > wrote: > >> >> On Fri, Oct 7, 2011 at 7:46 AM, Aneesh Kumar K.V > >> >> <aneesh.ku...@linux.vnet.ibm.com> wrote: > >> >> > cache=writethrough implies the file are opened in the host with > >> >> > O_SYNC open flag > >> >> > > >> >> > Signed-off-by: Aneesh Kumar K.V <aneesh.ku...@linux.vnet.ibm.com> > >> >> > --- > >> >> > fsdev/file-op-9p.h | 1 + > >> >> > fsdev/qemu-fsdev.c | 10 ++++++++-- > >> >> > fsdev/qemu-fsdev.h | 2 ++ > >> >> > hw/9pfs/virtio-9p-device.c | 5 +++++ > >> >> > hw/9pfs/virtio-9p.c | 24 ++++++++++++++++++------ > >> >> > qemu-config.c | 6 ++++++ > >> >> > qemu-options.hx | 17 ++++++++++++----- > >> >> > vl.c | 6 ++++++ > >> >> > 8 files changed, 58 insertions(+), 13 deletions(-) > >> >> > >> >> When would this be used? For serving up vanilla 9P? > >> >> > >> >> I think 9P.u and 9P.l have support for fsync(2) while vanilla 9P does > >> >> not. > >> >> > >> > > >> > TFSYNC is added by 9p.L. So we would need this for 9p.u. > >> > >> I think 9p.u is covered by this wstat hack in > >> http://ericvh.github.com/9p-rfc/rfc9p2000.html#anchor32: > >> > >> "if all the elements of the directory entry in a Twstat message are > >> ``don't touch'' val- ues, the server may interpret it as a request to > >> guarantee that the contents of the associated file are committed to > >> stable storage before the Rwstat message is returned." > >> > >> A real TFSYNC operation is nicer though and could be mandatory (the > >> 9P2000 RFC only says "the server *may*"). > >> > >> > Another use > >> > case is to ensure that we don't leave pages on host as dirty. That would > >> > ensure that large writeback from a guest don't result in large number of > >> > dirty pages on the host, thereby resulting in writeback in the host. It > >> > would be needed for predictable I/O behavior in a setup where we have > >> > multiple guest. > >> > >> I see. I'm mostly curious about this change because the caching modes > >> are a nightmare with block devices - a lot of time is spent discussing > >> and benchmarking them, and they cause confusion when configuring KVM. > >> > >> It sounds like O_SYNC is being used in order to keep page cache clean. > >> But keeping the page cache clean is a side-effect of O_SYNC's > >> behavior: writing out each page and synchronizing the disk write > >> cache. If you are attempting to bypass the page cache, just use > >> O_DIRECT without O_SYNC. > > > > But how about reads. I want to make sure i get to use the page cache for > > reads and also want to keep the page cache clean. > > > >> O_SYNC is doing the additional disk write > >> cache synchronization which will slow down I/O and prevent the server > >> from using disk write cache. O_SYNC is not the right flag to use. > > > > O_DIRECT have additional requirement on buffer alignment, and we don't > > want to send every read to disk. VirtFS also support zero copy > > read/write, so that buffer alignment will always not be possible. > > We want to make sure writes don't leave the page cache dirty so > > that host doesn't spent much time in write back of data dirtied by the > > guest. > > sync_file_range(2) kicks off the write-out but does not flush file > system metadata or synchronize the disk write cache. >
something like below ? commit bc7c28877b30155264f351d26692b3cceca853bb Author: Aneesh Kumar K.V <aneesh.ku...@linux.vnet.ibm.com> Date: Mon May 23 11:29:15 2011 +0530 hw/9pfs: Add new virtfs option cache=writethrough to skip host page cache cache=writethrough implies the file are opened in the host with O_SYNC open flag Signed-off-by: Aneesh Kumar K.V <aneesh.ku...@linux.vnet.ibm.com> diff --git a/fsdev/file-op-9p.h b/fsdev/file-op-9p.h index 8de8abf..84ec9e0 100644 --- a/fsdev/file-op-9p.h +++ b/fsdev/file-op-9p.h @@ -53,12 +53,16 @@ struct xattr_operations; /* FsContext flag values */ #define PATHNAME_FSCONTEXT 0x1 +/* cache flags */ +#define V9FS_WRITETHROUGH_CACHE 0x1 + typedef struct FsContext { int flags; char *fs_root; SecModel fs_sm; uid_t uid; + int cache_flags; struct xattr_operations **xops; /* fs driver specific data */ void *private; diff --git a/fsdev/qemu-fsdev.c b/fsdev/qemu-fsdev.c index 768819f..fce016b 100644 --- a/fsdev/qemu-fsdev.c +++ b/fsdev/qemu-fsdev.c @@ -34,6 +34,8 @@ int qemu_fsdev_add(QemuOpts *opts) const char *fstype = qemu_opt_get(opts, "fstype"); const char *path = qemu_opt_get(opts, "path"); const char *sec_model = qemu_opt_get(opts, "security_model"); + const char *cache = qemu_opt_get(opts, "cache"); + if (!fsdev_id) { fprintf(stderr, "fsdev: No id specified\n"); @@ -72,10 +74,14 @@ int qemu_fsdev_add(QemuOpts *opts) fsle->fse.path = g_strdup(path); fsle->fse.security_model = g_strdup(sec_model); fsle->fse.ops = FsTypes[i].ops; - + fsle->fse.cache_model = 0; + if (cache) { + if (!strcmp(cache, "writethrough")) { + fsle->fse.cache_model = V9FS_WRITETHROUGH_CACHE; + } + } QTAILQ_INSERT_TAIL(&fstype_entries, fsle, next); return 0; - } FsTypeEntry *get_fsdev_fsentry(char *id) diff --git a/fsdev/qemu-fsdev.h b/fsdev/qemu-fsdev.h index e04931a..9c440f2 100644 --- a/fsdev/qemu-fsdev.h +++ b/fsdev/qemu-fsdev.h @@ -41,6 +41,7 @@ typedef struct FsTypeEntry { char *fsdev_id; char *path; char *security_model; + int cache_flags; FileOperations *ops; } FsTypeEntry; diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c index e5b68da..b23e8a4 100644 --- a/hw/9pfs/virtio-9p-device.c +++ b/hw/9pfs/virtio-9p-device.c @@ -115,6 +115,7 @@ VirtIODevice *virtio_9p_init(DeviceState *dev, V9fsConf *conf) exit(1); } + s->ctx.cache_flags = fse->cache_flags; s->ctx.fs_root = g_strdup(fse->path); len = strlen(conf->tag); if (len > MAX_TAG_LEN) { diff --git a/hw/9pfs/virtio-9p-handle.c b/hw/9pfs/virtio-9p-handle.c index 5c8b5ed..441a37f 100644 --- a/hw/9pfs/virtio-9p-handle.c +++ b/hw/9pfs/virtio-9p-handle.c @@ -202,6 +202,15 @@ static ssize_t handle_pwritev(FsContext *ctx, int fd, const struct iovec *iov, return writev(fd, iov, iovcnt); } #endif + if (ctx->cache_flags & V9FS_WRITETHROUGH_CACHE) { + /* + * Initiate a writeback. This is not a data integrity sync. + * We want to ensure that we don't leave dirty pages in the cache + * after write when cache=writethrough is sepcified. + */ + sync_file_range(fd, offset, 0, + SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE); + } } static int handle_chmod(FsContext *fs_ctx, V9fsPath *fs_path, FsCred *credp) diff --git a/hw/9pfs/virtio-9p-local.c b/hw/9pfs/virtio-9p-local.c index 9559ff6..5745d14 100644 --- a/hw/9pfs/virtio-9p-local.c +++ b/hw/9pfs/virtio-9p-local.c @@ -213,6 +213,15 @@ static ssize_t local_pwritev(FsContext *ctx, int fd, const struct iovec *iov, return writev(fd, iov, iovcnt); } #endif + if (ctx->cache_flags & V9FS_WRITETHROUGH_CACHE) { + /* + * Initiate a writeback. This is not a data integrity sync. + * We want to ensure that we don't leave dirty pages in the cache + * after write when cache=writethrough is sepcified. + */ + sync_file_range(fd, offset, 0, + SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE); + } } static int local_chmod(FsContext *fs_ctx, V9fsPath *fs_path, FsCred *credp) diff --git a/hw/9pfs/virtio-9p.c b/hw/9pfs/virtio-9p.c index c01c31a..3958788 100644 --- a/hw/9pfs/virtio-9p.c +++ b/hw/9pfs/virtio-9p.c @@ -80,6 +80,20 @@ void cred_init(FsCred *credp) credp->fc_rdev = -1; } +static int get_dotl_openflags(V9fsState *s, int oflags) +{ + int flags; + /* + * Filter the client open flags + */ + flags = oflags & ~(O_NOCTTY | O_ASYNC | O_CREAT); + /* + * Ignore direct disk access hint until the server supports it. + */ + flags &= ~O_DIRECT; + return flags; +} + void v9fs_string_init(V9fsString *str) { str->data = NULL; @@ -1598,10 +1612,7 @@ static void v9fs_open(void *opaque) err = offset; } else { if (s->proto_version == V9FS_PROTO_2000L) { - flags = mode; - flags &= ~(O_NOCTTY | O_ASYNC | O_CREAT); - /* Ignore direct disk access hint until the server supports it. */ - flags &= ~O_DIRECT; + flags = get_dotl_openflags(s, mode); } else { flags = omode_to_uflags(mode); } @@ -1650,8 +1661,7 @@ static void v9fs_lcreate(void *opaque) goto out_nofid; } - /* Ignore direct disk access hint until the server supports it. */ - flags &= ~O_DIRECT; + flags = get_dotl_openflags(pdu->s, flags); err = v9fs_co_open2(pdu, fidp, &name, gid, flags | O_CREAT, mode, &stbuf); if (err < 0) { diff --git a/qemu-config.c b/qemu-config.c index 7a7854f..b2ab0b2 100644 --- a/qemu-config.c +++ b/qemu-config.c @@ -177,6 +177,9 @@ QemuOptsList qemu_fsdev_opts = { }, { .name = "security_model", .type = QEMU_OPT_STRING, + }, { + .name = "cache", + .type = QEMU_OPT_STRING, }, { /*End of list */ } }, @@ -199,6 +202,9 @@ QemuOptsList qemu_virtfs_opts = { }, { .name = "security_model", .type = QEMU_OPT_STRING, + }, { + .name = "cache", + .type = QEMU_OPT_STRING, }, { /*End of list */ } diff --git a/qemu-options.hx b/qemu-options.hx index dfbabd0..38f0aef 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -525,7 +525,8 @@ ETEXI DEFHEADING(File system options:) DEF("fsdev", HAS_ARG, QEMU_OPTION_fsdev, - "-fsdev local,id=id,path=path,security_model=[mapped|passthrough|none]\n", + "-fsdev local,id=id,path=path,security_model=[mapped|passthrough|none]\n" + " [,cache=writethrough]\n", QEMU_ARCH_ALL) STEXI @@ -541,7 +542,7 @@ The specific Fstype will determine the applicable options. Options to each backend are described below. -@item -fsdev local ,id=@var{id} ,path=@var{path} ,security_model=@var{security_model} +@item -fsdev local ,id=@var{id} ,path=@var{path} ,security_model=@var{security_model}[,cache=@var{cache}] Create a file-system-"device" for local-filesystem. @@ -552,13 +553,17 @@ Create a file-system-"device" for local-filesystem. @option{security_model} specifies the security model to be followed. @option{security_model} is required. +@option{cache} specifies whether to skip the host page cache. +@option{cache} is an optional argument. + @end table ETEXI DEFHEADING(Virtual File system pass-through options:) DEF("virtfs", HAS_ARG, QEMU_OPTION_virtfs, - "-virtfs local,path=path,mount_tag=tag,security_model=[mapped|passthrough|none]\n", + "-virtfs local,path=path,mount_tag=tag,security_model=[mapped|passthrough|none]\n" + " [,cache=writethrough]\n", QEMU_ARCH_ALL) STEXI @@ -574,7 +579,7 @@ The specific Fstype will determine the applicable options. Options to each backend are described below. -@item -virtfs local ,path=@var{path} ,mount_tag=@var{mount_tag} ,security_model=@var{security_model} +@item -virtfs local ,path=@var{path} ,mount_tag=@var{mount_tag} ,security_model=@var{security_model}[,cache=@var{cache}] Create a Virtual file-system-pass through for local-filesystem. @@ -585,10 +590,12 @@ Create a Virtual file-system-pass through for local-filesystem. @option{security_model} specifies the security model to be followed. @option{security_model} is required. - @option{mount_tag} specifies the tag with which the exported file is mounted. @option{mount_tag} is required. +@option{cache} specifies whether to skip the host page cache. +@option{cache} is an optional argument. + @end table ETEXI diff --git a/vl.c b/vl.c index bd4a5ce..6760e39 100644 --- a/vl.c +++ b/vl.c @@ -2794,6 +2794,7 @@ int main(int argc, char **argv, char **envp) case QEMU_OPTION_virtfs: { QemuOpts *fsdev; QemuOpts *device; + const char *cache; olist = qemu_find_opts("virtfs"); if (!olist) { @@ -2823,6 +2824,11 @@ int main(int argc, char **argv, char **envp) qemu_opt_get(opts, "mount_tag")); exit(1); } + + cache = qemu_opt_get(opts, "cache"); + if (cache) { + qemu_opt_set(fsdev, "cache", cache); + } qemu_opt_set(fsdev, "fstype", qemu_opt_get(opts, "fstype")); qemu_opt_set(fsdev, "path", qemu_opt_get(opts, "path")); qemu_opt_set(fsdev, "security_model",