On Wed, May 06, 2020 at 08:16:14PM +0100, Dr. David Alan Gilbert wrote: > * Colin Walters (walt...@verbum.org) wrote: > > I'd like to make use of virtiofs as part of our tooling in > > https://github.com/coreos/coreos-assembler > > Most of the code runs as non-root today; qemu also runs as non-root. > > We use 9p right now. > > > > virtiofsd's builtin sandboxing effectively assumes it runs as > > root. > > > > First, change the code to use `clone()` and not `unshare()+fork()`. > > > > Next, automatically use `CLONE_NEWUSER` if we're running as non root. > > Is it ever useful for root to run the code in a new user namespace?
Yes, user namespace is useful to both root and non-root alike. Roughly speaking, for root, it offers security benefits, for non-root it offers functionality benefits. The longer answer... With a new user namespaces, users inside the container get remapped to different set of users outside the host, through defined UID & GID mappings. For any UID/GID which doesn't have a mapping, access will get performed as (uid_t)-1 / (gid_t)-1. eg consider you have a range of host IDs 100,000->165,536 available. With user namespaces, you can now ssetuop a mapping of container IDs 0 -> 65536. Thus any time UID 0 inside the container does something, from the host POV they are acting as UID 100,000. If UID 30,000 inside the container does something, this is UID 130,000 in the host POV. If UID 80,000 in the container does something, this is uid -1 from the host POV. If the person in the host launching virtiofsd is non-root, then user namespaces mean they can offer the guest the full range of POSIX APIs wrt access control & file ownership, since they're no longer restricted to their single host UID when inside the container. They also get important things like CAP_DAC_OVERRIDE. IOW, for non-root, user namespaces unlock the full functionality of virtiofsd. Without it, we're limited to read-only access to files not owned by the current non-root user. If the person in the host launching virtiofsd is root, then user namespaces mean we can reduce the effective privileges of virtiofsd. Currently when inside the container, uid==0 is still the same as uid==0 outside. So if there are any resources visible inside the container (either accidentally or intentionally), then virtiofsd shouldn't have write access to, we're lacking protection. By adding usernamespace + a mapping, we strictly isolate virtiofsd from any host resources. The main pain point with user namespaces is that all the files in the directory you are exporting need to be shifted to match the UID/GID mapping user for the user namespaces. Traditionally this has needed a recursive chown of the tree to remap the file ownership. There has been talk of a filesystem overlay todo the remapping transparently, but I've lost track of whether that's a thing yet. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|