On Tue, Jun 02, 2020 at 09:53:18PM -0400, Colin Walters wrote: > On Tue, Jun 2, 2020, at 5:55 AM, Stefan Hajnoczi wrote: > > Ping Colin. It would be great if you have time to share your thoughts on > > this discussion and explain how you are using this patch. > > Yeah sorry about not replying in this thread earlier, this was just a quick > Friday side project for me and the thread obviously exploded =) > > Thinking about this more, probably what would be good enough for now is an > option to just disable internal containerization/sandboxing. In fact per the > discussion our production pipeline runs inside OpenShift 4 and because > Kubernetes doesn't support user namespaces yet it also doesn't support > recursive containerization, so we need an option to turn off the internal > containerization. > > Our use case is somewhat specialized - for what we're doing we generally > trust the guest. We use VMs for operating system testing and development of > content we trust, as opposed to e.g. something like kata. > > It's fine for us to run virtiofs as the same user/security context as qemu. > > So...something like this? (Only compile tested) ... > @@ -2775,6 +2775,8 @@ static void setup_capabilities(void) > static void setup_sandbox(struct lo_data *lo, struct fuse_session *se, > bool enable_syslog) > { > + if (se->no_namespaces) > + return; > setup_namespaces(lo, se); > setup_mounts(lo->source); > setup_seccomp(enable_syslog);
Something along these lines should work. Hopefully seccomp can be retained. It would also be necessary to check how not having the shared directory as / in the mount namespace affects functionality. For one, I'm pretty sure symlink escapes and similar path traversals outside the shared directory will be possible since virtiofsd normally relies on / as protection. Stefan
signature.asc
Description: PGP signature