On Wed, Apr 02, 2025 at 03:18:13PM -0700, Bobby Eshleman wrote:
> On Wed, Apr 02, 2025 at 10:21:36AM +0100, Daniel P. Berrangé wrote:
> > It occured to me that the problem we face with the CID space usage is
> > somewhat similar to the UID/GID space usage for user namespaces.
> > 
> > In the latter case, userns has exposed /proc/$PID/uid_map & gid_map, to
> > allow IDs in the namespace to be arbitrarily mapped onto IDs in the host.
> > 
> > At the risk of being overkill, is it worth trying a similar kind of
> > approach for the vsock CID space ?
> > 
> > A simple variant would be a /proc/net/vsock_cid_outside specifying a set
> > of CIDs which are exclusively referencing /dev/vhost-vsock associations
> > created outside the namespace. Anything not listed would be exclusively
> > referencing associations created inside the namespace.
> > 
> > A more complex variant would be to allow a full remapping of CIDs as is
> > done with userns, via a /proc/net/vsock_cid_map, which the same three
> > parameters, so that CID=15 association outside the namespace could be
> > remapped to CID=9015 inside the namespace, allow the inside namespace
> > to define its out association for CID=15 without clashing.
> > 
> > IOW, mapped CIDs would be exclusively referencing /dev/vhost-vsock
> > associations created outside namespace, while unmapped CIDs would be
> > exclusively referencing /dev/vhost-vsock associations inside the
> > namespace. 
> > 
> > A likely benefit of relying on a kernel defined mapping/partition of
> > the CID space is that apps like QEMU don't need changing, as there's
> > no need to invent a new /dev/vhost-vsock-netns device node.
> > 
> > Both approaches give the desirable security protection whereby the
> > inside namespace can be prevented from accessing certain CIDs that
> > were associated outside the namespace.
> > 
> > Some rule would need to be defined for updating the /proc/net/vsock_cid_map
> > file as it is the security control mechanism. If it is write-once then
> > if the container mgmt app initializes it, nothing later could change
> > it.
> > 
> > A key question is do we need the "first come, first served" behaviour
> > for CIDs where a CID can be arbitrarily used by outside or inside namespace
> > according to whatever tries to associate a CID first ?
> 
> I think with /proc/net/vsock_cid_outside, instead of disallowing the CID
> from being used, this could be solved by disallowing remapping the CID
> while in use?
> 
> The thing I like about this is that users can check
> /proc/net/vsock_cid_outside to figure out what might be going on,
> instead of trying to check lsof or ps to figure out if the VMM processes
> have used /dev/vhost-vsock vs /dev/vhost-vsock-netns.
> 
> Just to check I am following... I suppose we would have a few typical
> configurations for /proc/net/vsock_cid_outside. Following uid_map file
> format of:
>       "<local cid start>              <global cid start>              <range 
> size>"
> 
>       1. Identity mapping, current namespace CID is global CID (default
>       setting for new namespaces):
> 
>               # empty file
> 
>                               OR
> 
>               0    0    4294967295
> 
>       2. Complete isolation from global space (initialized, but no mappings):
> 
>               0    0    0
> 
>       3. Mapping in ranges of global CIDs
> 
>       For example, global CID space starts at 7000, up to 32-bit max:
> 
>               7000    0    4294960295
>       
>       Or for multiple mappings (0-100 map to 7000-7100, 1000-1100 map to
>       8000-8100) :
> 
>               7000    0       100
>               8000    1000    100
> 
> 
> One thing I don't love is that option 3 seems to not be addressing a
> known use case. It doesn't necessarily hurt to have, but it will add
> complexity to CID handling that might never get used?

Yeah, I have the same feeling that full remapping of CIDs is probably
adding complexity without clear benefit, unless it somehow helps us
with the nested-virt scenario to disambiguate L0/L1/L2 CID ranges ?
I've not thought the latter through to any great level of detail
though

> Since options 1/2 could also be represented by a boolean (yes/no
> "current ns shares CID with global"), I wonder if we could either A)
> only support the first two options at first, or B) add just
> /proc/net/vsock_ns_mode at first, which supports only "global" and
> "local", and later add a "mapped" mode plus /proc/net/vsock_cid_outside
> or the full mapping if the need arises?

Two options is sufficient if you want to control AF_VSOCK usage
and /dev/vhost-vsock usage as a pair. If you want to separately
control them though, it would push for three options - global,
local, and mixed. By mixed I mean AF_VSOCK in the NS can access
the global CID from the NS, but the NS can't associate the global
CID with a guest.

IOW, this breaks down like:

 * CID=N local - aka fully private

     Outside NS: Can associate outside CID=N with a guest.
                 AF_VSOCK permitted to access outside CID=N

     Inside NS: Can NOT associate outside CID=N with a guest
                Can associate inside CID=N with a guest
                AF_VSOCK forbidden to access outside CID=N
                AF_VSOCK permitted to access inside CID=N


 * CID=N mixed - aka partially shared

     Outside NS: Can associate outside CID=N with a guest.
                 AF_VSOCK permitted to access outside CID=N

     Inside NS: Can NOT associate outside CID=N with a guest
                AF_VSOCK permitted to access outside CID=N
                No inside CID=N concept


 * CID=N global - aka current historic behaviour

     Outside NS: Can associate outside CID=N with a guest.
                 AF_VSOCK permitted to access outside CID=N

     Inside NS: Can associate outside CID=N with a guest
                AF_VSOCK permitted to access outside CID=N
                No inside CID=N concept


I was thinking the 'mixed' mode might be useful if the outside NS wants
to retain control over setting up the association, but delegate to
processes in the inside NS for providing individual services to that
guest.  This means if the outside NS needs to restart the VM, there is
no race window in which the inside NS can grab the assocaition with the
CID

As for whether we need to control this per-CID, or a single setting
applying to all CID.

Consider that the host OS can be running one or more "service VMs" on
well known CIDs that can be leveraged from other NS, while those other
NS also run some  "end user VMs" that should be private to the NS.

IOW, the CIDs for the service VMs would need to be using "mixed"
policy, while the CIDs for the end user VMs would be "local".

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Reply via email to