Re: [PATCH v2 0/3] vsock: add namespace support to vhost-vsock

Bobby Eshleman Wed, 02 Apr 2025 15:18:44 -0700

On Wed, Apr 02, 2025 at 10:21:36AM +0100, Daniel P. Berrangé wrote:
> On Wed, Apr 02, 2025 at 10:13:43AM +0200, Stefano Garzarella wrote:
> > On Wed, 2 Apr 2025 at 02:21, Bobby Eshleman <[email protected]> wrote:
> > >
> > > I do like Stefano's suggestion to add a sysctl for a "strict" mode,
> > > Since it offers the best of both worlds, and still tends conservative in
> > > protecting existing applications... but I agree, the non-strict mode
> > > vsock would be unique WRT the usual concept of namespaces.
> > 
> > Maybe we could do the opposite, enable strict mode by default (I think 
> > it was similar to what I had tried to do with the kernel module in v1, I 
> > was young I know xD)
> > And provide a way to disable it for those use cases where the user wants 
> > backward compatibility, while paying the cost of less isolation.
> 
> I think backwards compatible has to be the default behaviour, otherwise
> the change has too high risk of breaking existing deployments that are
> already using netns and relying on VSOCK being global. Breakage has to
> be opt in.
> 
> > I was thinking two options (not sure if the second one can be done):
> > 
> >   1. provide a global sysfs/sysctl that disables strict mode, but this
> >   then applies to all namespaces
> > 
> >   2. provide something that allows disabling strict mode by namespace.
> >   Maybe when it is created there are options, or something that can be
> >   set later.
> > 
> > 2 would be ideal, but that might be too much, so 1 might be enough. In 
> > any case, 2 could also be a next step.
> > 
> > WDYT?
> 
> It occured to me that the problem we face with the CID space usage is
> somewhat similar to the UID/GID space usage for user namespaces.
> 
> In the latter case, userns has exposed /proc/$PID/uid_map & gid_map, to
> allow IDs in the namespace to be arbitrarily mapped onto IDs in the host.
> 
> At the risk of being overkill, is it worth trying a similar kind of
> approach for the vsock CID space ?
> 
> A simple variant would be a /proc/net/vsock_cid_outside specifying a set
> of CIDs which are exclusively referencing /dev/vhost-vsock associations
> created outside the namespace. Anything not listed would be exclusively
> referencing associations created inside the namespace.
> 
> A more complex variant would be to allow a full remapping of CIDs as is
> done with userns, via a /proc/net/vsock_cid_map, which the same three
> parameters, so that CID=15 association outside the namespace could be
> remapped to CID=9015 inside the namespace, allow the inside namespace
> to define its out association for CID=15 without clashing.
> 
> IOW, mapped CIDs would be exclusively referencing /dev/vhost-vsock
> associations created outside namespace, while unmapped CIDs would be
> exclusively referencing /dev/vhost-vsock associations inside the
> namespace. 
> 
> A likely benefit of relying on a kernel defined mapping/partition of
> the CID space is that apps like QEMU don't need changing, as there's
> no need to invent a new /dev/vhost-vsock-netns device node.
> 
> Both approaches give the desirable security protection whereby the
> inside namespace can be prevented from accessing certain CIDs that
> were associated outside the namespace.
> 
> Some rule would need to be defined for updating the /proc/net/vsock_cid_map
> file as it is the security control mechanism. If it is write-once then
> if the container mgmt app initializes it, nothing later could change
> it.
> 
> A key question is do we need the "first come, first served" behaviour
> for CIDs where a CID can be arbitrarily used by outside or inside namespace
> according to whatever tries to associate a CID first ?


I think with /proc/net/vsock_cid_outside, instead of disallowing the CID
from being used, this could be solved by disallowing remapping the CID
while in use?

The thing I like about this is that users can check
/proc/net/vsock_cid_outside to figure out what might be going on,
instead of trying to check lsof or ps to figure out if the VMM processes
have used /dev/vhost-vsock vs /dev/vhost-vsock-netns.

Just to check I am following... I suppose we would have a few typical
configurations for /proc/net/vsock_cid_outside. Following uid_map file
format of:
        "<local cid start>              <global cid start>              <range 
size>"

        1. Identity mapping, current namespace CID is global CID (default
        setting for new namespaces):

                # empty file

                                OR

                0    0    4294967295

        2. Complete isolation from global space (initialized, but no mappings):

                0    0    0

        3. Mapping in ranges of global CIDs

        For example, global CID space starts at 7000, up to 32-bit max:

                7000    0    4294960295
        
        Or for multiple mappings (0-100 map to 7000-7100, 1000-1100 map to
        8000-8100) :

                7000    0       100
                8000    1000    100


One thing I don't love is that option 3 seems to not be addressing a
known use case. It doesn't necessarily hurt to have, but it will add
complexity to CID handling that might never get used?

Since options 1/2 could also be represented by a boolean (yes/no
"current ns shares CID with global"), I wonder if we could either A)
only support the first two options at first, or B) add just
/proc/net/vsock_ns_mode at first, which supports only "global" and
"local", and later add a "mapped" mode plus /proc/net/vsock_cid_outside
or the full mapping if the need arises?

This could also be how we support Option 2 from Stefano's last email of
supporting per-namespace opt-in/opt-out.

Any thoughts on this?

> 
> IMHO those semantics lead to unpredictable behaviour for apps because
> what happens depends on ordering of app launches inside & outside the
> namespace, but they do sort of allow for VSOCK namespace behaviour to
> be 'zero conf' out of the box.
> 
> A mapping that strictly partitions CIDs to either outside or inside
> namespace usage, but never both, gives well defined behaviour, at the
> cost of needing to setup an initial mapping/partition.
> 

Agreed, I do like the plainness of reasoning through it.

Thanks!
Bobby

Re: [PATCH v2 0/3] vsock: add namespace support to vhost-vsock

Reply via email to