On Fri, Oct 21, 2022 at 11:26 AM Xuan Zhuo <[email protected]> wrote: > > On Fri, 21 Oct 2022 10:42:37 +0800, Jason Wang <[email protected]> wrote: > > On Wed, Oct 19, 2022 at 5:23 PM Xuan Zhuo <[email protected]> > > wrote: > > > > > > On Wed, 19 Oct 2022 17:11:21 +0800, Jason Wang <[email protected]> > > > wrote: > > > > On Wed, Oct 19, 2022 at 4:19 PM Xuan Zhuo <[email protected]> > > > > wrote: > > > > > > > > > > On Wed, 19 Oct 2022 16:13:07 +0800, Xuan Zhuo > > > > > <[email protected]> wrote: > > > > > > On Wed, 19 Oct 2022 16:01:42 +0800, Jason Wang > > > > > > <[email protected]> wrote: > > > > > > > On Wed, Oct 19, 2022 at 3:00 PM Xuan Zhuo > > > > > > > <[email protected]> wrote: > > > > > > > > > > > > > > > > On Tue, 18 Oct 2022 14:54:22 +0800, Jason Wang > > > > > > > > <[email protected]> wrote: > > > > > > > > > On Mon, Oct 17, 2022 at 8:31 PM Xuan Zhuo > > > > > > > > > <[email protected]> wrote: > > > > > > > > > > > > > > > > > > > > On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang > > > > > > > > > > <[email protected]> wrote: > > > > > > > > > > > Adding Stefan. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Oct 17, 2022 at 3:47 PM Xuan Zhuo > > > > > > > > > > > <[email protected]> wrote: > > > > > > > > > > > > > > > > > > > > > > > > Hello everyone, > > > > > > > > > > > > > > > > > > > > > > > > # Background > > > > > > > > > > > > > > > > > > > > > > > > Nowadays, there is a common scenario to accelerate > > > > > > > > > > > > communication between > > > > > > > > > > > > different VMs and containers, including light weight > > > > > > > > > > > > virtual machine based > > > > > > > > > > > > containers. One way to achieve this is to colocate them > > > > > > > > > > > > on the same host. > > > > > > > > > > > > However, the performance of inter-VM communication > > > > > > > > > > > > through network stack is not > > > > > > > > > > > > optimal and may also waste extra CPU cycles. This > > > > > > > > > > > > scenario has been discussed > > > > > > > > > > > > many times, but still no generic solution available [1] > > > > > > > > > > > > [2] [3]. > > > > > > > > > > > > > > > > > > > > > > > > With pci-ivshmem + SMC(Shared Memory Communications: > > > > > > > > > > > > [4]) based PoC[5], > > > > > > > > > > > > We found that by changing the communication channel > > > > > > > > > > > > between VMs from TCP to SMC > > > > > > > > > > > > with shared memory, we can achieve superior performance > > > > > > > > > > > > for a common > > > > > > > > > > > > socket-based application[5]: > > > > > > > > > > > > - latency reduced by about 50% > > > > > > > > > > > > - throughput increased by about 300% > > > > > > > > > > > > - CPU consumption reduced by about 50% > > > > > > > > > > > > > > > > > > > > > > > > Since there is no particularly suitable shared memory > > > > > > > > > > > > management solution > > > > > > > > > > > > matches the need for SMC(See ## Comparison with > > > > > > > > > > > > existing technology), and virtio > > > > > > > > > > > > is the standard for communication in the virtualization > > > > > > > > > > > > world, we want to > > > > > > > > > > > > implement a virtio-ism device based on virtio, which > > > > > > > > > > > > can support on-demand > > > > > > > > > > > > memory sharing across VMs, containers or VM-container. > > > > > > > > > > > > To match the needs of SMC, > > > > > > > > > > > > the virtio-ism device need to support: > > > > > > > > > > > > > > > > > > > > > > > > 1. Dynamic provision: shared memory regions are > > > > > > > > > > > > dynamically allocated and > > > > > > > > > > > > provisioned. > > > > > > > > > > > > 2. Multi-region management: the shared memory is > > > > > > > > > > > > divided into regions, > > > > > > > > > > > > and a peer may allocate one or more regions from the > > > > > > > > > > > > same shared memory > > > > > > > > > > > > device. > > > > > > > > > > > > 3. Permission control: The permission of each region > > > > > > > > > > > > can be set seperately. > > > > > > > > > > > > > > > > > > > > > > Looks like virtio-ROCE > > > > > > > > > > > > > > > > > > > > > > https://lore.kernel.org/all/[email protected]/T/ > > > > > > > > > > > > > > > > > > > > > > and virtio-vhost-user can satisfy the requirement? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > # Virtio ism device > > > > > > > > > > > > > > > > > > > > > > > > ISM devices provide the ability to share memory between > > > > > > > > > > > > different guests on a > > > > > > > > > > > > host. A guest's memory got from ism device can be > > > > > > > > > > > > shared with multiple peers at > > > > > > > > > > > > the same time. This shared relationship can be > > > > > > > > > > > > dynamically created and released. > > > > > > > > > > > > > > > > > > > > > > > > The shared memory obtained from the device is divided > > > > > > > > > > > > into multiple ism regions > > > > > > > > > > > > for share. ISM device provides a mechanism to notify > > > > > > > > > > > > other ism region referrers > > > > > > > > > > > > of content update events. > > > > > > > > > > > > > > > > > > > > > > > > # Usage (SMC as example) > > > > > > > > > > > > > > > > > > > > > > > > Maybe there is one of possible use cases: > > > > > > > > > > > > > > > > > > > > > > > > 1. SMC calls the interface ism_alloc_region() of the > > > > > > > > > > > > ism driver to return the > > > > > > > > > > > > location of a memory region in the PCI space and a > > > > > > > > > > > > token. > > > > > > > > > > > > 2. The ism driver mmap the memory region and return to > > > > > > > > > > > > SMC with the token > > > > > > > > > > > > 3. SMC passes the token to the connected peer > > > > > > > > > > > > 3. the peer calls the ism driver interface > > > > > > > > > > > > ism_attach_region(token) to > > > > > > > > > > > > get the location of the PCI space of the shared > > > > > > > > > > > > memory > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > # About hot plugging of the ism device > > > > > > > > > > > > > > > > > > > > > > > > Hot plugging of devices is a heavier, possibly > > > > > > > > > > > > failed, time-consuming, and > > > > > > > > > > > > less scalable operation. So, we don't plan to > > > > > > > > > > > > support it for now. > > > > > > > > > > > > > > > > > > > > > > > > # Comparison with existing technology > > > > > > > > > > > > > > > > > > > > > > > > ## ivshmem or ivshmem 2.0 of Qemu > > > > > > > > > > > > > > > > > > > > > > > > 1. ivshmem 1.0 is a large piece of memory that can > > > > > > > > > > > > be seen by all devices that > > > > > > > > > > > > use this VM, so the security is not enough. > > > > > > > > > > > > > > > > > > > > > > > > 2. ivshmem 2.0 is a shared memory belonging to a VM > > > > > > > > > > > > that can be read-only by all > > > > > > > > > > > > other VMs that use the ivshmem 2.0 shared memory > > > > > > > > > > > > device, which also does not > > > > > > > > > > > > meet our needs in terms of security. > > > > > > > > > > > > > > > > > > > > > > > > ## vhost-pci and virtiovhostuser > > > > > > > > > > > > > > > > > > > > > > > > Does not support dynamic allocation and therefore > > > > > > > > > > > > not suitable for SMC. > > > > > > > > > > > > > > > > > > > > > > I think this is an implementation issue, we can support > > > > > > > > > > > VHOST IOTLB > > > > > > > > > > > message then the regions could be added/removed on demand. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1. After the attacker connects with the victim, if the > > > > > > > > > > attacker does not > > > > > > > > > > dereference memory, the memory will be occupied under > > > > > > > > > > virtiovhostuser. In the > > > > > > > > > > case of ism devices, the victim can directly release the > > > > > > > > > > reference, and the > > > > > > > > > > maliciously referenced region only occupies the > > > > > > > > > > attacker's resources > > > > > > > > > > > > > > > > > > Let's define the security boundary here. E.g do we trust the > > > > > > > > > device or > > > > > > > > > not? If yes, in the case of virtiovhostuser, can we simple do > > > > > > > > > VHOST_IOTLB_UNMAP then we can safely release the memory from > > > > > > > > > the > > > > > > > > > attacker. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. The ism device of a VM can be shared with multiple > > > > > > > > > > (1000+) VMs at the same > > > > > > > > > > time, which is a challenge for virtiovhostuser > > > > > > > > > > > > > > > > > > Please elaborate more the the challenges, anything make > > > > > > > > > virtiovhostuser different? > > > > > > > > > > > > > > > > I understand (please point out any mistakes), one vvu device > > > > > > > > corresponds to one > > > > > > > > vm. If we share memory with 1000 vm, do we have 1000 vvu > > > > > > > > devices? > > > > > > > > > > > > > > There could be some misunderstanding here. With 1000 VM, you still > > > > > > > need 1000 virtio-sim devices I think. > > > > > > > > > > > > No, just use a virtio-ism device. > > > > > > > > > > For example, if the hardware memory of a virtio-ism is 1G, and an ism > > > > > region is > > > > > 1M, there are 1000 ism regions, and these ism regions can be shared > > > > > with > > > > > different vms. > > > > > > > > Right, this is what I've understood. > > > > > > > > What I want to say this might be achieved with virtio-vhost-user as > > > > well. But it may require a some changes on the protocol which I'm not > > > > sure it's worth to bother. And I've started to think about the > > > > possibility to build virtio-vhost-user on top (I don't see any blocker > > > > so far). > > > > > > Yes, it is theoretically possible to implement based on > > > virtio-vhost-user. But > > > when we try to implement it without depending on virtio-vhost-user, this > > > implementation is also very simple. Because the physical memory it shares > > > does > > > not come from a vm, but from the host. > > > > > > So I think we have reached an agreement on the relationship between ism > > > and > > > virtio-vhost-user. ism is used to provide shared memory to the upper > > > layer, and > > > this device should be necessary to add (of course, listen to some other > > > people's > > > opinions). And How is its backend shared with other vms? This is our > > > second > > > question. > > > > I'm not sure I get the question, but we're sharing memory not backend? > > > In the design of traditional devices such as virtio-net, a piece of memory is > allocated by guest A and then handed over to the backend for use. > virtio-vhost-user allows another guest B to access guest A's memory.
If you meant the RFC patch posted, yes. But actually, virtio-vhost-user could be used to implemented e.g host hands over memory for guest to use? > > Our approach is that the memory is allocated by the backend. When > alloc/attach, > just insert the memory into the guest's memory space using > memory_region_add_subregion(). That's why we don't use vhost-user in our > implementation. > > On the other hand, we are also looking in the other direction. If the memory > is > allocated by one vm in the guest, then we have to use the vhost-user protocol. Probably not? It works just like all the regions are pre-allocated in the case of ISM. Similarly, if we use virtio-vhost-user, we just need a new IOTLB message to allocate memory (or reuse the IOTLB_UPDATE). > > 1. The advantage of this is that it will be more convenient in resource > management > > 2. Using the vhost-user protocol on the backend implementation will be more > complicated than our current solution. > > 3. If the peer is malicious, then we have to unmap the memory mapping of the > peer. (This has been discussed in another email, and it should be > possible.) This only work if the peer's VMM is trusted. Thanks > > Thanks. > > > > > > > > > Thanks > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > And it is dynamic. After an ism region is shared with a vm, it can be > > > > > shared > > > > > with other vms. > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 3. The sharing relationship of ism is dynamically > > > > > > > > > > increased, and virtiovhostuser > > > > > > > > > > determines the sharing relationship at startup. > > > > > > > > > > > > > > > > > > Not necessarily with IOTLB API? > > > > > > > > > > > > > > > > Unlike virtio-vhost-user, which shares the memory of a vm with > > > > > > > > another vm, we > > > > > > > > provide the same memory on the host to two vms. So the > > > > > > > > implementation of this > > > > > > > > part will be much simpler. This is why we gave up > > > > > > > > virtio-vhost-user at the > > > > > > > > beginning. > > > > > > > > > > > > > > Ok, just to make sure we're at the same page. From spec level, > > > > > > > virtio-vhost-user doesn't (can't) limit the backend to be > > > > > > > implemented > > > > > > > in another VM. So it should be ok to be used for sharing memory > > > > > > > between a guest and host. > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 4. For security issues, the device under virtiovhostuser > > > > > > > > > > may mmap more memory, > > > > > > > > > > while ism only maps one region to other devices > > > > > > > > > > > > > > > > > > With VHOST_IOTLB_MAP, the map could be done per region. > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > # Design > > > > > > > > > > > > > > > > > > > > > > > > This is a structure diagram based on ism sharing > > > > > > > > > > > > between two vms. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > |-------------------------------------------------------------------------------------------------------------| > > > > > > > > > > > > | > > > > > > > > > > > > |------------------------------------------------| > > > > > > > > > > > > |------------------------------------------------| | > > > > > > > > > > > > | | Guest > > > > > > > > > > > > | | Guest > > > > > > > > > > > > | | > > > > > > > > > > > > | | > > > > > > > > > > > > | | > > > > > > > > > > > > | | > > > > > > > > > > > > | | ---------------- > > > > > > > > > > > > | | ---------------- > > > > > > > > > > > > | | > > > > > > > > > > > > | | | driver | [M1] [M2] [M3] > > > > > > > > > > > > | | | driver | [M2] [M3] > > > > > > > > > > > > | | > > > > > > > > > > > > | | ---------------- | | | > > > > > > > > > > > > | | ---------------- | | > > > > > > > > > > > > | | > > > > > > > > > > > > | | |cq| |map |map |map > > > > > > > > > > > > | | |cq| |map |map > > > > > > > > > > > > | | > > > > > > > > > > > > | | | | | | | > > > > > > > > > > > > | | | | | | > > > > > > > > > > > > | | > > > > > > > > > > > > | | | | ------------------- > > > > > > > > > > > > | | | | -------------------- > > > > > > > > > > > > | | > > > > > > > > > > > > | |----|--|----------------| device memory > > > > > > > > > > > > |-----| |----|--|----------------| device memory > > > > > > > > > > > > |----| | > > > > > > > > > > > > | | | | ------------------- > > > > > > > > > > > > | | | | -------------------- > > > > > > > > > > > > | | > > > > > > > > > > > > | | | > > > > > > > > > > > > | | | > > > > > > > > > > > > | | > > > > > > > > > > > > | | | > > > > > > > > > > > > | | | > > > > > > > > > > > > | | > > > > > > > > > > > > | | Qemu | > > > > > > > > > > > > | | Qemu | > > > > > > > > > > > > | | > > > > > > > > > > > > | > > > > > > > > > > > > |--------------------------------+---------------| > > > > > > > > > > > > |-------------------------------+----------------| | > > > > > > > > > > > > | | > > > > > > > > > > > > | > > > > > > > > > > > > | > > > > > > > > > > > > | | > > > > > > > > > > > > | > > > > > > > > > > > > | > > > > > > > > > > > > | > > > > > > > > > > > > |------------------------------+------------------------| > > > > > > > > > > > > | > > > > > > > > > > > > | > > > > > > > > > > > > | > > > > > > > > > > > > | > > > > > > > > > > > > | > > > > > > > > > > > > | > > > > > > > > > > > > | > > > > > > > > > > > > | > > > > > > > > > > > > -------------------------- > > > > > > > > > > > > | > > > > > > > > > > > > | > > > > > > > > > > > > | M1 | | M2 | | M3 | > > > > > > > > > > > > | > > > > > > > > > > > > | > > > > > > > > > > > > -------------------------- > > > > > > > > > > > > | > > > > > > > > > > > > | > > > > > > > > > > > > > > > > > > > > > > > > | > > > > > > > > > > > > | HOST > > > > > > > > > > > > > > > > > > > > > > > > | > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------------------------------------------------- > > > > > > > > > > > > > > > > > > > > > > > > # POC code > > > > > > > > > > > > > > > > > > > > > > > > Kernel: > > > > > > > > > > > > https://github.com/fengidri/linux-kernel-virtio-ism/commits/ism > > > > > > > > > > > > Qemu: https://github.com/fengidri/qemu/commits/ism > > > > > > > > > > > > > > > > > > > > > > > > If there are any problems, please point them out. > > > > > > > > > > > > > > > > > > > > > > > > Hope to hear from you, thank you. > > > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > https://projectacrn.github.io/latest/tutorials/enable_ivshmem.html > > > > > > > > > > > > [2] https://dl.acm.org/doi/10.1145/2847562 > > > > > > > > > > > > [3] > > > > > > > > > > > > https://hal.archives-ouvertes.fr/hal-00368622/document > > > > > > > > > > > > [4] https://lwn.net/Articles/711071/ > > > > > > > > > > > > [5] > > > > > > > > > > > > https://lore.kernel.org/netdev/[email protected]/T/ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Xuan Zhuo (2): > > > > > > > > > > > > Reserve device id for ISM device > > > > > > > > > > > > virtio-ism: introduce new device virtio-ism > > > > > > > > > > > > > > > > > > > > > > > > content.tex | 3 + > > > > > > > > > > > > virtio-ism.tex | 340 > > > > > > > > > > > > +++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > > > > > > > 2 files changed, 343 insertions(+) > > > > > > > > > > > > create mode 100644 virtio-ism.tex > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > 2.32.0.3.g01195cf9f > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > > > > > > > > To unsubscribe, e-mail: > > > > > > > > > > > > [email protected] > > > > > > > > > > > > For additional commands, e-mail: > > > > > > > > > > > > [email protected] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > > > > > > To unsubscribe, e-mail: > > > > > > > > > > [email protected] > > > > > > > > > > For additional commands, e-mail: > > > > > > > > > > [email protected] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > > To unsubscribe, e-mail: [email protected] > > > > > > For additional commands, e-mail: > > > > > > [email protected] > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
