Re: How to implement message forwarding from one CID to another in vhost driver
Hey Stefano, On 23.05.24 10:45, Stefano Garzarella wrote: On Tue, May 21, 2024 at 08:50:22AM GMT, Alexander Graf wrote: Howdy, On 20.05.24 14:44, Dorjoy Chowdhury wrote: Hey Stefano, Thanks for the reply. On Mon, May 20, 2024, 2:55 PM Stefano Garzarella wrote: Hi Dorjoy, On Sat, May 18, 2024 at 04:17:38PM GMT, Dorjoy Chowdhury wrote: Hi, Hope you are doing well. I am working on adding AWS Nitro Enclave[1] emulation support in QEMU. Alexander Graf is mentoring me on this work. A v1 patch series has already been posted to the qemu-devel mailing list[2]. AWS nitro enclaves is an Amazon EC2[3] feature that allows creating isolated execution environments, called enclaves, from Amazon EC2 instances, which are used for processing highly sensitive data. Enclaves have no persistent storage and no external networking. The enclave VMs are based on Firecracker microvm and have a vhost-vsock device for communication with the parent EC2 instance that spawned it and a Nitro Secure Module (NSM) device for cryptographic attestation. The parent instance VM always has CID 3 while the enclave VM gets a dynamic CID. The enclave VMs can communicate with the parent instance over various ports to CID 3, for example, the init process inside an enclave sends a heartbeat to port 9000 upon boot, expecting a heartbeat reply, letting the parent instance know that the enclave VM has successfully booted. The plan is to eventually make the nitro enclave emulation in QEMU standalone i.e., without needing to run another VM with CID 3 with proper vsock If you don't have to launch another VM, maybe we can avoid vhost-vsock and emulate virtio-vsock in user-space, having complete control over the behavior. So we could use this opportunity to implement virtio-vsock in QEMU [4] or use vhost-user-vsock [5] and customize it somehow. (Note: vhost-user-vsock already supports sibling communication, so maybe with a few modifications it fits your case perfectly) [4] https://gitlab.com/qemu-project/qemu/-/issues/2095 [5] https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock Thanks for letting me know. Right now I don't have a complete picture but I will look into them. Thank you. communication support. For this to work, one approach could be to teach the vhost driver in kernel to forward CID 3 messages to another CID N So in this case both CID 3 and N would be assigned to the same QEMU process? CID N is assigned to the enclave VM. CID 3 was supposed to be the parent VM that spawns the enclave VM (this is how it is in AWS, where an EC2 instance VM spawns the enclave VM from inside it and that parent EC2 instance always has CID 3). But in the QEMU case as we don't want a parent VM (we want to run enclave VMs standalone) we would need to forward the CID 3 messages to host CID. I don't know if it means CID 3 and CID N is assigned to the same QEMU process. Sorry. There are 2 use cases here: 1) Enclave wants to treat host as parent (default). In this scenario, the "parent instance" that shows up as CID 3 in the Enclave doesn't really exist. Instead, when the Enclave attempts to talk to CID 3, it should really land on CID 0 (hypervisor). When the hypervisor tries to connect to the Enclave on port X, it should look as if it originates from CID 3, not CID 0. 2) Multiple parent VMs. Think of an actual cloud hosting scenario. Here, we have multiple "parent instances". Each of them thinks it's CID 3. Each can spawn an Enclave that talks to CID 3 and reach the parent. For this case, I think implementing all of virtio-vsock in user space is the best path forward. But in theory, you could also swizzle CIDs to make random "real" CIDs appear as CID 3. Thank you for clarifying the use cases! Also for case 1, vhost-vsock doesn't support CID 0, so in my opinion it's easier to go into user-space with vhost-user-vsock or the built-in device. Sorry, I believe I meant CID 2. Effectively for case 1, when a process on the hypervisor listens on port 1234, it should be visible as 3:1234 from the VM and when the hypervisor process connects to :1234, it should look as if that connection came from CID 3. Maybe initially with vhost-user-vsock it's easier because we already have some thing that works and supports sibling communication (for case 2). The problem with vhost-user-vsock is that you don't get to use AF_VSOCK as a host process. A typical Nitro Enclaves application is split into 2 parts: An in-Enclave component that listens/connects to vsock and a parent process that listens/connects to vsock. The experience of launching an Enclave is very similar to launching a QEMU VM: You run nitro-cli and tell it to pop up the Enclave based on an EIF file. Nitro-cli then tells you the CID that was allocated for the Enclave and you communicate to it using that. What I would ideally like to have as development experience is that you run QEMU with unmodified Enclave components (the EIF file) a
Re: How to implement message forwarding from one CID to another in vhost driver
On 27.05.24 09:08, Alexander Graf wrote: Hey Stefano, On 23.05.24 10:45, Stefano Garzarella wrote: On Tue, May 21, 2024 at 08:50:22AM GMT, Alexander Graf wrote: Howdy, On 20.05.24 14:44, Dorjoy Chowdhury wrote: Hey Stefano, Thanks for the reply. On Mon, May 20, 2024, 2:55 PM Stefano Garzarella wrote: Hi Dorjoy, On Sat, May 18, 2024 at 04:17:38PM GMT, Dorjoy Chowdhury wrote: Hi, Hope you are doing well. I am working on adding AWS Nitro Enclave[1] emulation support in QEMU. Alexander Graf is mentoring me on this work. A v1 patch series has already been posted to the qemu-devel mailing list[2]. AWS nitro enclaves is an Amazon EC2[3] feature that allows creating isolated execution environments, called enclaves, from Amazon EC2 instances, which are used for processing highly sensitive data. Enclaves have no persistent storage and no external networking. The enclave VMs are based on Firecracker microvm and have a vhost-vsock device for communication with the parent EC2 instance that spawned it and a Nitro Secure Module (NSM) device for cryptographic attestation. The parent instance VM always has CID 3 while the enclave VM gets a dynamic CID. The enclave VMs can communicate with the parent instance over various ports to CID 3, for example, the init process inside an enclave sends a heartbeat to port 9000 upon boot, expecting a heartbeat reply, letting the parent instance know that the enclave VM has successfully booted. The plan is to eventually make the nitro enclave emulation in QEMU standalone i.e., without needing to run another VM with CID 3 with proper vsock If you don't have to launch another VM, maybe we can avoid vhost-vsock and emulate virtio-vsock in user-space, having complete control over the behavior. So we could use this opportunity to implement virtio-vsock in QEMU [4] or use vhost-user-vsock [5] and customize it somehow. (Note: vhost-user-vsock already supports sibling communication, so maybe with a few modifications it fits your case perfectly) [4] https://gitlab.com/qemu-project/qemu/-/issues/2095 [5] https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock Thanks for letting me know. Right now I don't have a complete picture but I will look into them. Thank you. communication support. For this to work, one approach could be to teach the vhost driver in kernel to forward CID 3 messages to another CID N So in this case both CID 3 and N would be assigned to the same QEMU process? CID N is assigned to the enclave VM. CID 3 was supposed to be the parent VM that spawns the enclave VM (this is how it is in AWS, where an EC2 instance VM spawns the enclave VM from inside it and that parent EC2 instance always has CID 3). But in the QEMU case as we don't want a parent VM (we want to run enclave VMs standalone) we would need to forward the CID 3 messages to host CID. I don't know if it means CID 3 and CID N is assigned to the same QEMU process. Sorry. There are 2 use cases here: 1) Enclave wants to treat host as parent (default). In this scenario, the "parent instance" that shows up as CID 3 in the Enclave doesn't really exist. Instead, when the Enclave attempts to talk to CID 3, it should really land on CID 0 (hypervisor). When the hypervisor tries to connect to the Enclave on port X, it should look as if it originates from CID 3, not CID 0. 2) Multiple parent VMs. Think of an actual cloud hosting scenario. Here, we have multiple "parent instances". Each of them thinks it's CID 3. Each can spawn an Enclave that talks to CID 3 and reach the parent. For this case, I think implementing all of virtio-vsock in user space is the best path forward. But in theory, you could also swizzle CIDs to make random "real" CIDs appear as CID 3. Thank you for clarifying the use cases! Also for case 1, vhost-vsock doesn't support CID 0, so in my opinion it's easier to go into user-space with vhost-user-vsock or the built-in device. Sorry, I believe I meant CID 2. Effectively for case 1, when a process on the hypervisor listens on port 1234, it should be visible as 3:1234 from the VM and when the hypervisor process connects to :1234, it should look as if that connection came from CID 3. Now that I'm thinking about my message again: What if we just introduce a sysfs/sysctl file for vsock that indicates the "host CID" (default: 2)? Users that want vhost-vsock to behave as if the host is CID 3 can just write 3 to it. It means we'd need to change all references to VMADDR_CID_HOST to instead refer to a global variable that indicates the new "host CID". It'd need some more careful massaging to not break number namespace assumptions (<= CID_HOST no longer works), but the idea should fly. That would give us all 3 options: 1) User sets vsock.host_cid = 3 to simulate that the host is in reality an enclave parent 2) User spawns VM with CID = 3 to run parent payload inside 3) User spawns parent and enclave VMs with vhost-vsock-user whic
Re: [PATCH net v2 2/2] Revert "virtio_net: Add a lock for per queue RX coalesce"
On Thu, 2024-05-23 at 15:46 +0800, Heng Qi wrote: > This reverts commit 4d4ac2ececd3c42a08dd32a6e3a4aaf25f7efe44. > > When the following snippet is run, lockdep will report a deadlock[1]. > > /* Acquire all queues dim_locks */ > for (i = 0; i < vi->max_queue_pairs; i++) > mutex_lock(&vi->rq[i].dim_lock); > > There's no deadlock here because the vq locks are always taken > in the same order, but lockdep can not figure it out, and we > can not make each lock a separate class because there can be more > than MAX_LOCKDEP_SUBCLASSES of vqs. > > However, dropping the lock is harmless: > 1. If dim is enabled, modifications made by dim worker to coalescing > params may cause the user's query results to be dirty data. It looks like the above can confuse the user-space/admin? Have you considered instead re-factoring virtnet_send_rx_notf_coal_cmds() to avoid acquiring all the mutex in sequence? Thanks! Paolo
Re: [PATCH net-next 0/7] virtnet_net: prepare for af-xdp
On Mon, 27 May 2024 11:38:49 +0800, Jason Wang wrote: > On Thu, May 23, 2024 at 10:27 AM Xuan Zhuo wrote: > > > > Any comments for this. > > > > Thanks. > > Will have a look. > > Btw, does Michael happy with moving files into a dedicated directory? Based on our previous discussions, I believe that he is okay. Thanks > > Thanks > > > > > On Wed, 8 May 2024 16:05:07 +0800, Xuan Zhuo > > wrote: > > > This patch set prepares for supporting af-xdp zerocopy. > > > There is no feature change in this patch set. > > > I just want to reduce the patch num of the final patch set, > > > so I split the patch set. > > > > > > #1-#3 add independent directory for virtio-net > > > #4-#7 do some refactor, the sub-functions will be used by the subsequent > > > commits > > > > > > Thanks. > > > > > > Xuan Zhuo (7): > > > virtio_net: independent directory > > > virtio_net: move core structures to virtio_net.h > > > virtio_net: add prefix virtnet to all struct inside virtio_net.h > > > virtio_net: separate virtnet_rx_resize() > > > virtio_net: separate virtnet_tx_resize() > > > virtio_net: separate receive_mergeable > > > virtio_net: separate receive_buf > > > > > > MAINTAINERS | 2 +- > > > drivers/net/Kconfig | 9 +- > > > drivers/net/Makefile | 2 +- > > > drivers/net/virtio/Kconfig| 12 + > > > drivers/net/virtio/Makefile | 8 + > > > drivers/net/virtio/virtnet.h | 246 > > > .../{virtio_net.c => virtio/virtnet_main.c} | 534 ++ > > > 7 files changed, 452 insertions(+), 361 deletions(-) > > > create mode 100644 drivers/net/virtio/Kconfig > > > create mode 100644 drivers/net/virtio/Makefile > > > create mode 100644 drivers/net/virtio/virtnet.h > > > rename drivers/net/{virtio_net.c => virtio/virtnet_main.c} (94%) > > > > > > -- > > > 2.32.0.3.g01195cf9f > > > > > >
Re: [PATCH net v2 2/2] Revert "virtio_net: Add a lock for per queue RX coalesce"
On Mon, 27 May 2024 12:42:43 +0200, Paolo Abeni wrote: > On Thu, 2024-05-23 at 15:46 +0800, Heng Qi wrote: > > This reverts commit 4d4ac2ececd3c42a08dd32a6e3a4aaf25f7efe44. > > > > When the following snippet is run, lockdep will report a deadlock[1]. > > > > /* Acquire all queues dim_locks */ > > for (i = 0; i < vi->max_queue_pairs; i++) > > mutex_lock(&vi->rq[i].dim_lock); > > > > There's no deadlock here because the vq locks are always taken > > in the same order, but lockdep can not figure it out, and we > > can not make each lock a separate class because there can be more > > than MAX_LOCKDEP_SUBCLASSES of vqs. > > > > However, dropping the lock is harmless: > > 1. If dim is enabled, modifications made by dim worker to coalescing > > params may cause the user's query results to be dirty data. > > It looks like the above can confuse the user-space/admin? Maybe, but we don't seem to guarantee this -- the global query interface (.get_coalesce) cannot guarantee correct results when the DIM and .get_per_queue_coalesce are present: 1. DIM has been around for a long time (it will modify the per-queue parameters), but many nics only have interfaces for querying global parameters. 2. Some nics provide the .get_per_queue_coalesce interface, it is not synchronized with DIM. So I think this is acceptable. > > Have you considered instead re-factoring > virtnet_send_rx_notf_coal_cmds() to avoid acquiring all the mutex in > sequence? Perhaps it is a way to not traverse and update the parameters of each queue in the global settings interface. Thanks. > > Thanks! > > Paolo >