Re: [PATCH] virtiofsd: Use clone() and not unshare(), support non-root

Daniel P . Berrangé Thu, 07 May 2020 02:29:03 -0700

On Wed, May 06, 2020 at 08:16:14PM +0100, Dr. David Alan Gilbert wrote:
> * Colin Walters (walt...@verbum.org) wrote:
> > I'd like to make use of virtiofs as part of our tooling in
> > https://github.com/coreos/coreos-assembler
> > Most of the code runs as non-root today; qemu also runs as non-root.
> > We use 9p right now.
> > 
> > virtiofsd's builtin sandboxing effectively assumes it runs as
> > root.
> > 
> > First, change the code to use `clone()` and not `unshare()+fork()`.
> > 
> > Next, automatically use `CLONE_NEWUSER` if we're running as non root.
> 
> Is it ever useful for root to run the code in a new user namespace?


Yes, user namespace is useful to both root and non-root alike. Roughly
speaking, for root, it offers security benefits, for non-root it offers
functionality benefits.

The longer answer...

With a new user namespaces, users inside the container get remapped
to different set of users outside the host, through defined UID & GID
mappings.  For any UID/GID which doesn't have a mapping, access will
get performed as (uid_t)-1 / (gid_t)-1.

eg consider you have a range of host IDs 100,000->165,536 available.
With user namespaces, you can now ssetuop a mapping of container
IDs 0 -> 65536.

Thus any time  UID 0 inside the container does something, from the
host POV they are acting as UID 100,000.  If UID 30,000 inside the
container does something, this is UID 130,000 in the host POV. If
UID 80,000 in the container does something, this is uid -1 from
the host POV.

If the person in the host launching virtiofsd is non-root, then
user namespaces mean they can offer the guest the full range of
POSIX APIs wrt access control & file ownership, since they're
no longer restricted to their single host UID when inside the
container.  They also get important things like CAP_DAC_OVERRIDE.
IOW, for non-root, user namespaces unlock the full functionality
of virtiofsd. Without it, we're limited to read-only access to
files not owned by the current non-root user.

If the person in the host launching virtiofsd is root, then user
namespaces mean we can reduce the effective privileges of virtiofsd.
Currently when inside the container, uid==0 is still the same as
uid==0 outside. So if there are any resources visible inside the
container (either accidentally or intentionally), then virtiofsd
shouldn't have write access to, we're lacking protection. By
adding usernamespace + a mapping, we strictly isolate virtiofsd
from any host resources.

The main pain point with user namespaces is that all the files
in the directory you are exporting need to be shifted to match
the UID/GID mapping user for the user namespaces. Traditionally
this has needed a recursive chown of the tree to remap the file
ownership. There has been talk of a filesystem overlay todo the
remapping transparently, but I've lost track of whether that's
a thing yet.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH] virtiofsd: Use clone() and not unshare(), support non-root

Reply via email to