On Fri, May 01, 2020 at 02:25:48PM -0400, Colin Walters wrote: > I'd like to make use of virtiofs as part of our tooling in > https://github.com/coreos/coreos-assembler > Most of the code runs as non-root today; qemu also runs as non-root. > We use 9p right now. > > virtiofsd's builtin sandboxing effectively assumes it runs as > root. > > First, change the code to use `clone()` and not `unshare()+fork()`. > > Next, automatically use `CLONE_NEWUSER` if we're running as non root.
I'd suggest splitting these two, so that the re-factoring is separate from introducing new functionality. > > This is similar logic to that in https://github.com/containers/bubblewrap > (Which...BTW, it could make sense for virtiofs to depend on bubblewrap > and re-exec itself rather than re-implementing the containerization > itself) > > Signed-off-by: Colin Walters <walt...@verbum.org> > --- > tools/virtiofsd/passthrough_ll.c | 26 +++++++++++++++++++++----- > 1 file changed, 21 insertions(+), 5 deletions(-) > > diff --git a/tools/virtiofsd/passthrough_ll.c > b/tools/virtiofsd/passthrough_ll.c > index 4c35c95b25..468617f6d6 100644 > --- a/tools/virtiofsd/passthrough_ll.c > +++ b/tools/virtiofsd/passthrough_ll.c > @@ -2530,6 +2530,21 @@ static void print_capabilities(void) > printf("}\n"); > } > > +/* Copied from bubblewrap */ > +static int > +raw_clone(unsigned long flags, void *child_stack) > +{ > +#if defined(__s390__) || defined(__CRIS__) > + /* > + * On s390 and cris the order of the first and second arguments > + * of the raw clone() system call is reversed. > + */ > + return (int) syscall(__NR_clone, child_stack, flags); > +#else > + return (int) syscall(__NR_clone, flags, child_stack); > +#endif > +} What's the reason for using the raw syscall ? Was it just to avoid having to allocate a new stack space ? > + > /* > * Move to a new mount, net, and pid namespaces to isolate this process. > */ > @@ -2547,14 +2562,15 @@ static void setup_namespaces(struct lo_data *lo, > struct fuse_session *se) > * an empty network namespace to prevent TCP/IP and other network > * activity in case this process is compromised. > */ > - if (unshare(CLONE_NEWPID | CLONE_NEWNS | CLONE_NEWNET) != 0) { > - fuse_log(FUSE_LOG_ERR, "unshare(CLONE_NEWPID | CLONE_NEWNS): %m\n"); > - exit(1); > + int clone_flags = SIGCHLD | CLONE_NEWPID | CLONE_NEWNS | CLONE_NEWNET; > + /* If we're non root, we need a new user namespace */ > + if (getuid() != 0) { > + clone_flags |= CLONE_NEWUSER; > } IIUC, with CLONE_NEWUSER we need to set a UID/GID mapping, otherwise all file accesses will be squashed to the UID -1. Or was it intentional that you're only trying to provide read-only access to files that are world-accessible ? > - child = fork(); > + child = raw_clone(clone_flags, NULL); > if (child < 0) { > - fuse_log(FUSE_LOG_ERR, "fork() failed: %m\n"); > + fuse_log(FUSE_LOG_ERR, "clone() failed: %m\n"); > exit(1); > } > if (child > 0) { Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|