Re: [PATCH] vsock: Enable H2G override

Stefano Garzarella Tue, 03 Mar 2026 01:51:46 -0800

On Mon, Mar 02, 2026 at 08:04:22PM +0100, Alexander Graf wrote:

On 02.03.26 17:25, Stefano Garzarella wrote:
On Mon, Mar 02, 2026 at 04:48:33PM +0100, Alexander Graf wrote:
On 02.03.26 13:06, Stefano Garzarella wrote:
CCing Bryan, Vishnu, and Broadcom list.

On Mon, Mar 02, 2026 at 12:47:05PM +0100, Stefano Garzarella wrote:
Please target net-next tree for this new feature.

On Mon, Mar 02, 2026 at 10:41:38AM +0000, Alexander Graf wrote:
Vsock maintains a single CID number space which can be used to
communicate to the host (G2H) or to a child-VM (H2G). Thecurrent logictrivially assumes that G2H is only relevant for CID <= 2because thesetarget the hypervisor. However, in environments like NitroEnclaves, aninstance that hosts vhost_vsock powered VMs may still wantto communicateto Enclaves that are reachable at higher CIDs throughvirtio-vsock-pci.
That means that for CID > 2, we really want an overlay. Bydefault, allCIDs are owned by the hypervisor. But if vhost registers aCID, it takes
precedence.  Implement that logic. Vhost already knows which CIDs it
supports anyway.
With this logic, I can run a Nitro Enclave as well as anested VM with
vhost-vsock support in parallel, with the parent instance able to
communicate to both simultaneously.
I honestly don't understand why VMADDR_FLAG_TO_HOST (addedspecifically for Nitro IIRC) isn't enough for this scenarioand we have to add this change. Can you elaborate a bit moreabout the relationship between this change andVMADDR_FLAG_TO_HOST we added?
The main problem I have with VMADDR_FLAG_TO_HOST for connect() isthat it punts the complexity to the user. Instead of a single CIDaddress space, you now effectively create 2 spaces: One forTO_HOST (needs a flag) and one for TO_GUEST (no flag). But everyuser space tool needs to learn about this flag. That may work forsuper special-case applications. But propagating that all the wayinto socat, iperf, etc etc? It's just creating friction.
Okay, I would like to have this (or part of it) in the commitmessage to better explain why we want this change.
IMHO the most natural experience is to have a single CID space,potentially manually segmented by launching VMs of one kind withina certain range.
I see, but at this point, should the kernel set VMADDR_FLAG_TO_HOSTin the remote address if that path is taken "automagically" ?
So in that way the user space can have a way to understand if it'stalking with a nested guest or a sibling guest.
That said, I'm concerned about the scenario where an applicationdoes not even consider communicating with a sibling VM.
If that's really a realistic concern, then we should add aVMADDR_FLAG_TO_GUEST that the application can set. Default behavior ofan application that provides no flags is "route to whatever you canfind": If vhost is loaded, it routes to vhost. If a vsock backend


mmm, we have always documented this simple behavior:
- CID = 2 talks to the host
- CID >= 3 talks to the guest

Now we are changing this by adding fallback. I don't think we shouldchange the default behavior, but rather provide new ways to enable thisnew behavior.

I find it strange that an application running on Linux 7.0 has a defaultbehavior where using CID=42 always talks to a nested VM, but startingwith Linux 7.1, it also starts talking to a sibling VM.

driver is loaded, it routes there. But the application has no say inwhere it goes: It's purely a system configuration thing.

This is true for complex things like IP, but for VSOCK we have alwayswanted to keep the default behavior very simple (as written above).Everything else must be explicitly enabled IMHO.

Until now, it knew that by not setting that flag, it could only talkto nested VMs, so if there was no VM with that CID, the connectionsimply failed. Whereas from this patch onwards, if the device in thehost supports sibling VMs and there is a VM with that CID, theapplication finds itself talking to a sibling VM instead of a nestedone, without having any idea.
I'd say an application that attempts to talk to a CID that it does nowknow whether it's vhost routed or not is running into "undefined"territory. If you rmmod the vhost driver, it would also talk to thehypervisor provided vsock.

Oh, I missed that. And I also fixed that behaviour with commit65b422d9b61b ("vsock: forward all packets to the host when no H2G isregistered") after I implemented the multi-transport support.

mmm, this could change my position ;-) (although, to be honest, I don'tunderstand why it was like that in the first place, but that's how it isnow).


Please document also this in the new commit message, is a good point.

Although when H2G is loaded, we behave differently. However, it is truethat sysctl helps us standardize this behavior.


I don't know whether to see it as a regression or not.

Should we make this feature opt-in in some way, such as sockopt orsysctl? (I understand that there is the previous problem, buthonestly, it seems like a significant change to the behavior ofAF_VSOCK).
We can create a sysctl to enable behavior with default=on. But I'magainst making the cumbersome does-not-work-out-of-the-box experiencethe default. Will include it in v2.

The opposite point of view is that we would not want to have differentdefault behavior between 7.0 and 7.1 when H2G is loaded.

At the end of the day, the host vs guest problem is super similarto a routing table.
Yeah, but the point of AF_VSOCK is precisely to avoid complexitiessuch as routing tables as much as possible; otherwise, AF_INET isalready there and ready to be used. In theory, we only wantcommunication between host and guest.
Yes, but nesting is a thing and nobody thought about it :). Inretrospect, it would have been to annotate the CID with the direction:H5 goes to CID5 on host and G5 goes to CID5 on guest. But I see nochance to change that by now.


Yep, this is why we added the VMADDR_FLAG_TO_HOST.

Stefano

Re: [PATCH] vsock: Enable H2G override

Reply via email to