On Mon, May 4, 2020, at 10:07 AM, Marc-André Lureau wrote:
> Now that systemd-nspawn works without privileges, isn't that also a
> solution? One that would fit both system and session level
> permissions, and integration with other services?
This is a complex topic and one I should probably write up in the bubblewrap
README.md. Today for example for CoreOS, our build and CI processes run inside
OpenShift (Kubernetes) - we aren't running systemd inside our containers.
bubblewrap is a small self-contained C wrapper around the container system
calls basically. In contrast, AFAICS right now, nspawn requires systemd -
which won't work for our use case.
Really the contention point here is systemd's dependency on cgroups for process
tracking; in a "nested containerization" scenario you often just want the
cgroups from the "outer" container to apply. But having nested mounts/pid
namespaces are still very useful. (That said, cgroups v2 allows sane nesting,
but we aren't there yet)
Also related is https://github.com/kubernetes/enhancements/issues/127 - without
that one requires privileged containers to do nesting.
Now honestly, probably an even easier fix is `virtiofsd --disable-sandboxing`
because we fully trust the code running in these VMs.
Or to directly respond again to your proposal: systemd-nspawn as an option may
work for some cases but won't for mine (I don't want virtiofsd/qemu instances
to "escape" the build container or run separately).