Am 26.12.2013 um 13:48 hat Peter Lieven geschrieben: > This patch adds native support for accessing images on NFS shares without > the requirement to actually mount the entire NFS share on the host. > > NFS Images can simply be specified by an url of the form: > nfs://<host>/<export>/<filename> > > For example: > qemu-img create -f qcow2 nfs://10.0.0.1/qemu-images/test.qcow2 > > You need LibNFS from Ronnie Sahlberg available at: > git://github.com/sahlberg/libnfs.git > for this to work. > > During configure it is automatically probed for libnfs and support > is enabled on-the-fly. You can forbid or enforce libnfs support > with --disable-libnfs or --enable-libnfs respectively. > > Due to NFS restrictions you might need to execute your binaries > as root, allow them to open priviledged ports (<1024) or specify > insecure option on the NFS server. > > For additional information on ROOT vs. non-ROOT operation and URL > format + parameters see: > https://raw.github.com/sahlberg/libnfs/master/README > > LibNFS currently support NFS version 3 only. > > Signed-off-by: Peter Lieven <p...@kamp.de> > --- > v4->v5: > - disussed with Ronnie and decided to move URL + Paramter parsing to LibNFS. > This allows for URL parameter processing directly in LibNFS without > altering > the qemu NFS block driver. This bumps the version requirement for LibNFS > to 1.9.0 though.
Considering that we'll likely want to add new options in the future, I'm not sure if this is a good idea. This means that struct nfs_url will change, and if qemu isn't updated, it might not even notice that some option was requested in a new field that it doesn't know and provide, so it will silently ignore it. Or if we have a qemu built against an older libnfs, this must be marked as an incompatible ABI change, so it can't run at all with the newer libnfs version. > - added a pointer to the LibNFS readme where additional information about > ROOT privilidge requirements can be found as this raised a few concerns. > - removed a trailing dot in an error statement [Fam]. > > v3->v4: > - finally added full implementation of bdrv_get_allocated_file_size [Stefan] > - removed trailing \n from error statements [Stefan] > > v2->v3: > - rebased the stefanha/block > - use pkg_config to check for libnfs (ignoring cflags which are broken in > 1.8.0) [Stefan] > - fixed NFSClient declaration [Stefan] > - renamed Task variables to task [Stefan] > - renamed NFSTask to NFSRPC [Ronnie] > - do not update bs->total_sectors in nfs_co_writev [Stefan] > - return -ENOMEM on all async call failures [Stefan,Ronnie] > - fully implement ftruncate > - use util/uri.c for URL parsing [Stefan] > - reworked nfs_file_open_common to nfs_client_open which works on NFSClient > [Stefan] > - added a comment ot the connect message that libnfs support NFSv3 only at > the moment. > - DID NOT add full implementation of bdrv_get_allocated_file_size because > we are not in a coroutine context and I cannot do an async call here. > I could do a sync call if there would be a guarantee that no requests > are in flight. [Stefan] > > v1->v2: > - fixed block/Makefile.objs [Ronnie] > - do not always register a read handler [Ronnie] > - add support for reading beyond EOF [Fam] > - fixed struct and paramter naming [Fam] > - fixed overlong lines and whitespace errors [Fam] > - return return status from libnfs whereever possible [Fam] > - added comment why we set allocated_file_size to -ENOTSUP after write [Fam] > - avoid segfault when parsing filname [Fam] > - remove unused close_bh from NFSClient [Fam] > - avoid dividing and mutliplying total_size by BDRV_SECTOR_SIZE in > nfs_file_create [Fam] > > MAINTAINERS | 5 + > block/Makefile.objs | 1 + > block/nfs.c | 405 > +++++++++++++++++++++++++++++++++++++++++++++++++++ > configure | 26 ++++ qapi-schema.json is missing, so you can't add NFS block devices using blockdev-add. > 4 files changed, 437 insertions(+) > create mode 100644 block/nfs.c > > diff --git a/MAINTAINERS b/MAINTAINERS > index a5ab8f8..09996ab 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -935,6 +935,11 @@ M: Peter Lieven <p...@kamp.de> > S: Supported > F: block/iscsi.c > > +NFS > +M: Peter Lieven <p...@kamp.de> > +S: Maintained > +F: block/nfs.c > + > SSH > M: Richard W.M. Jones <rjo...@redhat.com> > S: Supported > diff --git a/block/Makefile.objs b/block/Makefile.objs > index 4e8c91e..e254a21 100644 > --- a/block/Makefile.objs > +++ b/block/Makefile.objs > @@ -12,6 +12,7 @@ block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o > ifeq ($(CONFIG_POSIX),y) > block-obj-y += nbd.o nbd-client.o sheepdog.o > block-obj-$(CONFIG_LIBISCSI) += iscsi.o > +block-obj-$(CONFIG_LIBNFS) += nfs.o > block-obj-$(CONFIG_CURL) += curl.o > block-obj-$(CONFIG_RBD) += rbd.o > block-obj-$(CONFIG_GLUSTERFS) += gluster.o > diff --git a/block/nfs.c b/block/nfs.c > new file mode 100644 > index 0000000..4023b71 > --- /dev/null > +++ b/block/nfs.c > @@ -0,0 +1,405 @@ > +/* > + * QEMU Block driver for native access to files on NFS shares > + * > + * Copyright (c) 2013 Peter Lieven <p...@kamp.de> > + * > + * Permission is hereby granted, free of charge, to any person obtaining a > copy > + * of this software and associated documentation files (the "Software"), to > deal > + * in the Software without restriction, including without limitation the > rights > + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell > + * copies of the Software, and to permit persons to whom the Software is > + * furnished to do so, subject to the following conditions: > + * > + * The above copyright notice and this permission notice shall be included in > + * all copies or substantial portions of the Software. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING > FROM, > + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN > + * THE SOFTWARE. > + */ > + > +#include "config-host.h" > + > +#include <poll.h> > +#include "qemu-common.h" > +#include "qemu/config-file.h" > +#include "qemu/error-report.h" > +#include "block/block_int.h" > +#include "trace.h" > +#include "qemu/iov.h" > +#include "sysemu/sysemu.h" > + > +#include <nfsc/libnfs-zdr.h> > +#include <nfsc/libnfs.h> > +#include <nfsc/libnfs-raw.h> > +#include <nfsc/libnfs-raw-mount.h> > + > +typedef struct NFSClient { > + struct nfs_context *context; > + struct nfsfh *fh; > + int events; > + bool has_zero_init; > +} NFSClient; > + > +typedef struct NFSRPC { > + int status; > + int complete; > + QEMUIOVector *iov; > + struct stat *st; > + Coroutine *co; > + QEMUBH *bh; > +} NFSRPC; > + > +static void nfs_process_read(void *arg); > +static void nfs_process_write(void *arg); > + > +static void nfs_set_events(NFSClient *client) > +{ > + int ev = nfs_which_events(client->context); > + if (ev != client->events) { > + qemu_aio_set_fd_handler(nfs_get_fd(client->context), > + (ev & POLLIN) ? nfs_process_read : NULL, > + (ev & POLLOUT) ? nfs_process_write : NULL, > + client); > + > + } > + client->events = ev; > +} > + > +static void nfs_process_read(void *arg) > +{ > + NFSClient *client = arg; > + nfs_service(client->context, POLLIN); > + nfs_set_events(client); > +} > + > +static void nfs_process_write(void *arg) > +{ > + NFSClient *client = arg; > + nfs_service(client->context, POLLOUT); > + nfs_set_events(client); > +} > + > +static void nfs_co_init_task(NFSClient *client, NFSRPC *task) > +{ > + *task = (NFSRPC) { > + .co = qemu_coroutine_self(), > + }; > +} > + > +static void nfs_co_generic_bh_cb(void *opaque) > +{ > + NFSRPC *task = opaque; > + qemu_bh_delete(task->bh); > + qemu_coroutine_enter(task->co, NULL); > +} > + > +static void > +nfs_co_generic_cb(int status, struct nfs_context *nfs, void *data, > + void *private_data) > +{ > + NFSRPC *task = private_data; > + task->complete = 1; > + task->status = status; > + if (task->status > 0 && task->iov) { > + if (task->status <= task->iov->size) { > + qemu_iovec_from_buf(task->iov, 0, data, task->status); > + } else { > + task->status = -EIO; Short reads don't happen in practice with libnfs? > + } > + } > + if (task->status == 0 && task->st) { > + memcpy(task->st, data, sizeof(struct stat)); > + } > + if (task->co) { > + task->bh = qemu_bh_new(nfs_co_generic_bh_cb, task); > + qemu_bh_schedule(task->bh); > + } > +} > + > +static int coroutine_fn nfs_co_readv(BlockDriverState *bs, > + int64_t sector_num, int nb_sectors, > + QEMUIOVector *iov) > +{ > + NFSClient *client = bs->opaque; > + NFSRPC task; > + > + nfs_co_init_task(client, &task); > + task.iov = iov; > + > + if (nfs_pread_async(client->context, client->fh, > + sector_num * BDRV_SECTOR_SIZE, > + nb_sectors * BDRV_SECTOR_SIZE, > + nfs_co_generic_cb, &task) != 0) { > + return -ENOMEM; > + } > + > + while (!task.complete) { > + nfs_set_events(client); > + qemu_coroutine_yield(); > + } > + > + if (task.status < 0) { > + return task.status; > + } > + > + return 0; > +} > + > +static int coroutine_fn nfs_co_writev(BlockDriverState *bs, > + int64_t sector_num, int nb_sectors, > + QEMUIOVector *iov) > +{ > + NFSClient *client = bs->opaque; > + NFSRPC task; > + char *buf = NULL; > + > + nfs_co_init_task(client, &task); > + > + buf = g_malloc(nb_sectors * BDRV_SECTOR_SIZE); > + qemu_iovec_to_buf(iov, 0, buf, nb_sectors * BDRV_SECTOR_SIZE); > + > + if (nfs_pwrite_async(client->context, client->fh, > + sector_num * BDRV_SECTOR_SIZE, > + nb_sectors * BDRV_SECTOR_SIZE, > + buf, nfs_co_generic_cb, &task) != 0) { > + g_free(buf); > + return -ENOMEM; > + } > + > + while (!task.complete) { > + nfs_set_events(client); > + qemu_coroutine_yield(); > + } > + > + g_free(buf); > + > + if (task.status != nb_sectors * BDRV_SECTOR_SIZE) { > + return task.status < 0 ? task.status : -EIO; Does this duplicate the logic of nfs_co_generic_cb? > + } > + > + return 0; > +} > + > +static int coroutine_fn nfs_co_flush(BlockDriverState *bs) > +{ > + NFSClient *client = bs->opaque; > + NFSRPC task; > + > + nfs_co_init_task(client, &task); > + > + if (nfs_fsync_async(client->context, client->fh, nfs_co_generic_cb, > + &task) != 0) { > + return -ENOMEM; > + } > + > + while (!task.complete) { > + nfs_set_events(client); > + qemu_coroutine_yield(); > + } > + > + return task.status; > +} > + > +static QemuOptsList runtime_opts = { > + .name = "nfs", > + .head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head), > + .desc = { > + { > + .name = "filename", > + .type = QEMU_OPT_STRING, > + .help = "URL to the NFS file", > + }, > + { /* end of list */ } > + }, > +}; I think this is the point where I disagree. First of all, if it's a URL, call it "url" instead of "filename". But more importantly, a URL encodes options in a string instead of structured options that can be set separately. I would split this into options 'server', 'path', etc. and only have bdrv_parse_filename() invoke the URL parsing to fill these options if the user preferred the URL style. Kevin