On Wed, Apr 02, 2025 at 10:13:43AM +0200, Stefano Garzarella wrote: > On Wed, 2 Apr 2025 at 02:21, Bobby Eshleman <bobbyeshle...@gmail.com> wrote: > > > > I do like Stefano's suggestion to add a sysctl for a "strict" mode, > > Since it offers the best of both worlds, and still tends conservative in > > protecting existing applications... but I agree, the non-strict mode > > vsock would be unique WRT the usual concept of namespaces. > > Maybe we could do the opposite, enable strict mode by default (I think > it was similar to what I had tried to do with the kernel module in v1, I > was young I know xD) > And provide a way to disable it for those use cases where the user wants > backward compatibility, while paying the cost of less isolation.
I think backwards compatible has to be the default behaviour, otherwise the change has too high risk of breaking existing deployments that are already using netns and relying on VSOCK being global. Breakage has to be opt in. > I was thinking two options (not sure if the second one can be done): > > 1. provide a global sysfs/sysctl that disables strict mode, but this > then applies to all namespaces > > 2. provide something that allows disabling strict mode by namespace. > Maybe when it is created there are options, or something that can be > set later. > > 2 would be ideal, but that might be too much, so 1 might be enough. In > any case, 2 could also be a next step. > > WDYT? It occured to me that the problem we face with the CID space usage is somewhat similar to the UID/GID space usage for user namespaces. In the latter case, userns has exposed /proc/$PID/uid_map & gid_map, to allow IDs in the namespace to be arbitrarily mapped onto IDs in the host. At the risk of being overkill, is it worth trying a similar kind of approach for the vsock CID space ? A simple variant would be a /proc/net/vsock_cid_outside specifying a set of CIDs which are exclusively referencing /dev/vhost-vsock associations created outside the namespace. Anything not listed would be exclusively referencing associations created inside the namespace. A more complex variant would be to allow a full remapping of CIDs as is done with userns, via a /proc/net/vsock_cid_map, which the same three parameters, so that CID=15 association outside the namespace could be remapped to CID=9015 inside the namespace, allow the inside namespace to define its out association for CID=15 without clashing. IOW, mapped CIDs would be exclusively referencing /dev/vhost-vsock associations created outside namespace, while unmapped CIDs would be exclusively referencing /dev/vhost-vsock associations inside the namespace. A likely benefit of relying on a kernel defined mapping/partition of the CID space is that apps like QEMU don't need changing, as there's no need to invent a new /dev/vhost-vsock-netns device node. Both approaches give the desirable security protection whereby the inside namespace can be prevented from accessing certain CIDs that were associated outside the namespace. Some rule would need to be defined for updating the /proc/net/vsock_cid_map file as it is the security control mechanism. If it is write-once then if the container mgmt app initializes it, nothing later could change it. A key question is do we need the "first come, first served" behaviour for CIDs where a CID can be arbitrarily used by outside or inside namespace according to whatever tries to associate a CID first ? IMHO those semantics lead to unpredictable behaviour for apps because what happens depends on ordering of app launches inside & outside the namespace, but they do sort of allow for VSOCK namespace behaviour to be 'zero conf' out of the box. A mapping that strictly partitions CIDs to either outside or inside namespace usage, but never both, gives well defined behaviour, at the cost of needing to setup an initial mapping/partition. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|