On Mon, Oct 03, 2022 at 06:51:42PM -0400, Colin Walters wrote: > > > On Thu, Sep 29, 2022, at 1:03 PM, Vivek Goyal wrote: > > > > So rust version of virtiofsd, already supports running unprivileged > > (inside a user namespace). > > I know, but as I already said, the use case here is running inside an > OpenShift unprivileged pod where *we are already in a container*. > > > host$ podman unshare -- virtiofsd --socket-path=/tmp/vfsd.sock > > --shared-dir /mnt \ > > --announce-submounts --sandbox chroot & > > Yes, but in current OCP 4.11 our seccomp policy denies CLONE_NEWUSER:
Hmm..., no user namespaces allowed. So sandbox=none in theory should work once we fix it for unprivileged user. https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/136 Given you are already running inside a pod/container, not sure if locking down virtiofsd with openat2(RESOLVE_IN_ROOT)/landlock is must for you from security point of view. virtiofsd should not be able to access anything outside the pod/container anyway and can only affect things inside the pod/container. Once we add support for openat2(). Next issue is do you need arbitrary uid/gid support. By default it will be a single uid/gid filesystem. Is that enough for your use case? Or inside the guest you need to be able to switch between arbitrary uid/gid on this virtiofs filesystem. > > ``` > $ unshare -m > unshare: unshare failed: Function not implemented > ``` > > https://docs.openshift.com/container-platform/4.11/security/seccomp-profiles.html > > > I think only privileged operation it needs is assigning a range of > > subuid/subgid to the uid you are using on host. > > We also turn on NO_NEW_PRIVILEGES by default in OCP pods. > > Now, I *could* in general get elevated permissions where I need to today. > But it's also really important to me to have a long term goal of having > operating system builds and tests work well as "just another workload" in our > production container platform (now, one *does* want to bind in /dev/kvm, but > that's generally safe, and even that strictly speaking is optional if one can > stomach the ~10x perf hit). I am assuming this 10x performance hit is being compared with native container build and test where no VM will be launched. > > > Can you give rust virtiofsd (unprivileged) a try. > > I admit to not actually trying it in a pod, but I think we all agree it can't > work, and the only thing that can today is openat2. Agreed. Right now we rely on using user namespace for unpriviliged use case. We should be able to enable sandbox=none for unprivileged user (no user namespace) and possibly add openat2() support as well. I think being able to provide arbitrary uid/gid support will be more tricky and more work. It will need to store actual uid/gid into some sort of user xattr. (as done by 9pfs and fuse-overlay and libkrun etc). And I will not be surprised that there are bunch of corner cases using that approach. (setuid/setgid automatic clearing etc.) Thanks Vivek