On Tue, Jun 14, 2022 at 12:32 AM Eugenio Perez Martin <epere...@redhat.com> wrote: > > On Wed, Jun 8, 2022 at 9:28 PM Eugenio Perez Martin <epere...@redhat.com> > wrote: > > > > On Wed, Jun 8, 2022 at 7:51 AM Jason Wang <jasow...@redhat.com> wrote: > > > > > > > > > 在 2022/5/20 03:12, Eugenio Pérez 写道: > > > > Control virtqueue is used by networking device for accepting various > > > > commands from the driver. It's a must to support multiqueue and other > > > > configurations. > > > > > > > > Shadow VirtQueue (SVQ) already makes possible migration of virtqueue > > > > states, effectively intercepting them so qemu can track what regions of > > > > memory > > > > are dirty because device action and needs migration. However, this does > > > > not > > > > solve networking device state seen by the driver because CVQ messages, > > > > like > > > > changes on MAC addresses from the driver. > > > > > > > > To solve that, this series uses SVQ infraestructure proposed to > > > > intercept > > > > networking control messages used by the device. This way, qemu is able > > > > to > > > > update VirtIONet device model and to migrate it. > > > > > > > > However, to intercept all queues would slow device data forwarding. To > > > > solve > > > > that, only the CVQ must be intercepted all the time. This is achieved > > > > using > > > > the ASID infraestructure, that allows different translations for > > > > different > > > > virtqueues. The most updated kernel part of ASID is proposed at [1]. > > > > > > > > You can run qemu in two modes after applying this series: only > > > > intercepting > > > > cvq with x-cvq-svq=on or intercept all the virtqueues adding cmdline > > > > x-svq=on: > > > > > > > > -netdev > > > > type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-cvq-svq=on,x-svq=on > > > > > > > > First three patches enable the update of the virtio-net device model > > > > for each > > > > CVQ message acknoledged by the device. > > > > > > > > Patches from 5 to 9 enables individual SVQ to copy the buffers to > > > > QEMU's VA. > > > > This allows simplyfing the memory mapping, instead of map all the > > > > guest's > > > > memory like in the data virtqueues. > > > > > > > > Patch 10 allows to inject control messages to the device. This allows > > > > to set > > > > state to the device both at QEMU startup and at live migration > > > > destination. In > > > > the future, this may also be used to emulate _F_ANNOUNCE. > > > > > > > > Patch 11 updates kernel headers, but it assign random numbers to needed > > > > ioctls > > > > because they are still not accepted in the kernel. > > > > > > > > Patches 12-16 enables the set of the features of the net device model > > > > to the > > > > vdpa device at device start. > > > > > > > > Last ones enables the sepparated ASID and SVQ. > > > > > > > > Comments are welcomed. > > > > > > > > > As discussed, I think we need to split this huge series into smaller ones: > > > > > > 1) shadow CVQ only, this makes rx-filter-event work > > > 2) ASID support for CVQ > > > > > > And for 1) we need consider whether or not it could be simplified. > > > > > > Or do it in reverse order, since if we do 1) first, we may have security > > > issues. > > > > > > > I'm ok with both, but I also think 2) before 1) might make more sense. > > There is no way to only shadow CVQ otherwise ATM. > > > > On second thought, that order is kind of harder. > > If we only map CVQ buffers, we need to either: > a. Copy them to controlled buffers > b. Track properly when to unmap them
Just to make sure we're at the same page: I meant we can start with e.g having a dedicated ASID for CVQ but still using CVQ passthrough. Then do other stuff on top. > > Alternative a. have the same problems exposed in this RFC: It's hard > (and unneeded in the final version) to know the size to copy. > Alternative b. also requires things not needed in the final version, > like to count the number of times each page is mapped and unmapped. > > So I'll go to the first alternative, that is also the proposed order > of the RFC. What security issues do you expect beyond the comments in > this series? If we shadow CVQ without ASID. The guest may guess the IOVA of CVQ and try to peek/modify it? Thanks > > Thanks! > > > Can we do as with previous base SVQ patches? they were merged although > > there is still no way to enable SVQ. > > > > Thanks! > > > > > Thoughts? > > > > > > Thanks > > > > > > > > > > > > > > TODO: > > > > * Fallback on regular CVQ if QEMU cannot isolate in its own ASID by any > > > > reason, blocking migration. This is tricky, since it can cause that > > > > the VM > > > > cannot be migrated anymore, so some way of block it must be used. > > > > * Review failure paths, some are with TODO notes, other don't. > > > > > > > > Changes from rfc v7: > > > > * Don't map all guest space in ASID 1 but copy all the buffers. No need > > > > for > > > > more memory listeners. > > > > * Move net backend start callback to SVQ. > > > > * Wait for device CVQ commands used by the device at SVQ start, > > > > avoiding races. > > > > * Changed ioctls, but they're provisional anyway. > > > > * Reorder commits so refactor and code adding ones are closer to usage. > > > > * Usual cleaning: better tracing, doc, patches messages, ... > > > > > > > > Changes from rfc v6: > > > > * Fix bad iotlb updates order when batching was enabled > > > > * Add reference counting to iova_tree so cleaning is simpler. > > > > > > > > Changes from rfc v5: > > > > * Fixes bad calculus of cvq end group when MQ is not acked by the guest. > > > > > > > > Changes from rfc v4: > > > > * Add missing tracing > > > > * Add multiqueue support > > > > * Use already sent version for replacing g_memdup > > > > * Care with memory management > > > > > > > > Changes from rfc v3: > > > > * Fix bad returning of descriptors to SVQ list. > > > > > > > > Changes from rfc v2: > > > > * Fix use-after-free. > > > > > > > > Changes from rfc v1: > > > > * Rebase to latest master. > > > > * Configure ASID instead of assuming cvq asid != data vqs asid. > > > > * Update device model so (MAC) state can be migrated too. > > > > > > > > [1] https://lkml.kernel.org/kvm/20220224212314.1326-1-gda...@xilinx.com/ > > > > > > > > Eugenio Pérez (21): > > > > virtio-net: Expose ctrl virtqueue logic > > > > vhost: Add custom used buffer callback > > > > vdpa: control virtqueue support on shadow virtqueue > > > > virtio: Make virtqueue_alloc_element non-static > > > > vhost: Add vhost_iova_tree_find > > > > vdpa: Add map/unmap operation callback to SVQ > > > > vhost: move descriptor translation to vhost_svq_vring_write_descs > > > > vhost: Add SVQElement > > > > vhost: Add svq copy desc mode > > > > vhost: Add vhost_svq_inject > > > > vhost: Update kernel headers > > > > vdpa: delay set_vring_ready after DRIVER_OK > > > > vhost: Add ShadowVirtQueueStart operation > > > > vhost: Make possible to check for device exclusive vq group > > > > vhost: add vhost_svq_poll > > > > vdpa: Add vhost_vdpa_start_control_svq > > > > vdpa: Add asid attribute to vdpa device > > > > vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs > > > > vhost: Add reference counting to vhost_iova_tree > > > > vdpa: Add x-svq to NetdevVhostVDPAOptions > > > > vdpa: Add x-cvq-svq > > > > > > > > qapi/net.json | 13 +- > > > > hw/virtio/vhost-iova-tree.h | 7 +- > > > > hw/virtio/vhost-shadow-virtqueue.h | 61 ++- > > > > include/hw/virtio/vhost-vdpa.h | 3 + > > > > include/hw/virtio/vhost.h | 3 + > > > > include/hw/virtio/virtio-net.h | 4 + > > > > include/hw/virtio/virtio.h | 1 + > > > > include/standard-headers/linux/vhost_types.h | 11 +- > > > > linux-headers/linux/vhost.h | 25 +- > > > > hw/net/vhost_net.c | 5 +- > > > > hw/net/virtio-net.c | 84 +++-- > > > > hw/virtio/vhost-iova-tree.c | 35 +- > > > > hw/virtio/vhost-shadow-virtqueue.c | 378 ++++++++++++++++--- > > > > hw/virtio/vhost-vdpa.c | 206 +++++++++- > > > > hw/virtio/virtio.c | 2 +- > > > > net/vhost-vdpa.c | 294 ++++++++++++++- > > > > hw/virtio/trace-events | 10 +- > > > > 17 files changed, 1012 insertions(+), 130 deletions(-) > > > > > > > >