Series applied to dpdk-next-virtio. Thanks.
--yliu On Thu, Oct 05, 2017 at 10:36:08AM +0200, Maxime Coquelin wrote: > This v3 lists the feature in the release note, and fixes the bug in > is_vring_iotlb_update() reported by Yuanhan. > > The purpose of this series is to add support for > VIRTIO_F_IOMMU_PLATFORM feature, by implementing device IOTLB in the > vhost-user backend. It improves the guest safety by enabling the > possibility to isolate the Virtio device. > > It makes possible to use Virtio PMD in guest with using VFIO driver > without enable_unsafe_noiommu_mode parameter set, so that the DPDK > application on guest can only access memory its has been allowed to, > and preventing malicious/buggy DPDK application in guest to make > vhost-user backend write random guest memory. Note that Virtio-net > Kernel driver also support IOMMU. > > The series depends on Qemu's "vhost-user: Specify and implement > device IOTLB support" [0], available upstream and which will be part > of Qemu v2.10 release. > > Performance-wise, even if this RFC has still room for optimizations, > no performance degradation is noticed with static mappings (i.e. DPDK > on guest) with PVP benchmark: > Traffic Generator: Moongen (lua-trafficgen) > Acceptable Loss: 0.005% > Validation run time: 1 min > Guest DPDK version/commit: v17.05 > QEMU version/commit: master (6db174aed1fd) > Virtio features: default > CPU: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz > NIC: 2 x X710 > Page size: 1G host/1G guest > Results (bidirectional, total of the two flows): > - base: 18.8Mpps > - base + IOTLB series, IOMMU OFF: 18.8Mpps > - base + IOTLB series, IOMMU ON: 18.8Mpps (14.5Mpps w/o PATCH 21/21) > > This is explained because IOTLB misses, which are very costly, only > happen at startup time. Indeed, once used, the buffers are not > invalidated, so if the IOTLB cache is large enough, there will be only > cache hits. Also, the use of 1G huge pages improves the IOTLB cache > searching time by reducing the number of entries. > > With 2M hugepages, a performance degradation is seen with IOMMU on: > Traffic Generator: Moongen (lua-trafficgen) > Acceptable Loss: 0.005% > Validation run time: 1 min > Guest DPDK version/commit: v17.05 > QEMU version/commit: master (6db174aed1fd) > Virtio features: default > CPU: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz > NIC: 2 x X710 > Page size: 2M host/2M guest > Results (bidirectional, total of the two flows): > - base: 18.8Mpps > - base + IOTLB series, IOMMU OFF: 18.8Mpps > - base + IOTLB series, IOMMU ON: 13.5Mpps (12.4Mpps wo PATCH 21/21) > > A possible improvement would be to merge contiguous IOTLB entries sharing > the same permissions. A very rough patch implementing this idea fixes > the performance degradation (18.8Mpps), but the required work to clean > it would delay this series after v17.11. > > With dynamic mappings (i.e. Virtio-net kernel driver), this is another > story. The performance is so poor it makes it almost unusable. Indeed, > since the Kernel driver unmaps the buffers as soon as they are handled, > almost all descriptors buffers addresses translations result in an IOTLB > miss. There is not much that can be done on DPDK side, except maybe > batching IOTLB miss requests no to break bursts, but it would require > a big rework. In Qemu, we may consider enabling IOMMU MAP notifications, > so that DPDK receives the IOTLB updates without having to send IOTLB miss > request. > > Regarding the design choices: > - I initially intended to use userspace RCU library[1] for the cache > implementation, but it would have added an external dependency, and the > lib is not available in all distros. Qemu for example got rid of this > dependency by copying some of the userspace RCU lib parts into Qemu tree, > but this is not possible with DPDK due to licensing issues (RCU lib is > LGPL v2). Thanks to Jason advice, I implemented the cache using rd/wr > locks. > - I initially implemented a per-device IOTLB cache, but the concurrent > acccesses on the IOTLB lock had huge impact on performance (~-40% in > bidirectionnal, expect even worse with multiqueue). I move to a per- > virtqueue IOTLB design, which prevents this concurrency. > - The slave IOTLB miss request supports reply-ack feature in spec, but > this version doesn't block or busy-wait for the corresponding update so > that other queues sharing the same lcore can be processed in the meantime. > > For those who would like to test the series, I made it available on > gitlab[2] (vhost_user_iotlb_v1 tag). The guest kernel command line requires > the intel_iommu=on parameter, and the guest should be started with and > iommu device attached to the virtio-net device. For example: > > ./qemu-system-x86_64 \ > -enable-kvm -m 4096 -smp 2 \ > -M q35,kernel-irqchip=split \ > -cpu host \ > -device intel-iommu,device-iotlb=on,intremap \ > -device ioh3420,id=root.1,chassis=1 \ > -chardev socket,id=char0,path=/tmp/vhost-user1 \ > -netdev type=vhost-user,id=hn2,chardev=char0 \ > -device > virtio-net-pci,netdev=hn2,id=v0,mq=off,mac=$MAC,bus=root.1,disable-modern=off,disable-legacy=on,iommu_platform=on,ats=on > \ > ... > > [0]: https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg00520.html > [1]: http://liburcu.org/ > [2]: https://gitlab.com/mcoquelin/dpdk-next-virtio/commits/vhost_user_iotlb_v2 > > Changes since v1: > ================= > - Add feature support in release note (Yuanhan) > - Fix return value in is_vring_iotlb_update() (Yuanhan) > > Changes since v1: > ================= > - Implement random cache entry eviction instead of removing all entries > when cache is full (Yuanhan) > - Adds bounds check on vring_idx (Remy) > - Initialize slave_req_fd to -1 (Tiwei) > - Remove virtio-net device lock (Tiwei) > - Simplify vhost_user_iotlb_cache_remove() (Tiwei) > - Squash iotlb lock usage optimization patch > - Inline noiommu part of vhost_iova_to_vva to remove performance > regressions with IOMMU=off > - Reworked iotlb files copyrights > > Changes since RFC: > ================== > - Fix memory leak in error patch reported by Jens > - Rework wait for IOTLB update by stopping the burst to let other > queues to be processed, if any. It implies the introduction an > iotlb_pending_list, so that iotlb miss requests aren't sent multiple > times for a same address. > - Optimize iotlb lock usage to recover to same as IOMMU off performance > - Fix device locking issue in rte_vhost_dequeue_burst() error path > - Change virtio_dev_rx error handling for consistency with mergeable rx, > and to ease returning in case of IOTLB misses. > - Fix checkpatch warnings reported by checkpa...@dpdk.org > > Maxime Coquelin (19): > Revert "vhost: workaround MQ fails to startup" > vhost: make error handling consistent in rx path > vhost: prepare send_vhost_message() to slave requests > vhost: add support to slave requests channel > vhost: declare missing IOMMU-related definitions for old kernels > vhost: add iotlb helper functions > vhost: iotlb: add pending miss request list and helpers > vhost-user: add support to IOTLB miss slave requests > vhost: initialize vrings IOTLB caches > vhost-user: handle IOTLB update and invalidate requests > vhost: introduce guest IOVA to backend VA helper > vhost: use the guest IOVA to host VA helper > vhost: enable rings at the right time > vhost: don't dereference invalid dev pointer after its reallocation > vhost: postpone rings addresses translation > vhost-user: translate ring addresses when IOMMU enabled > vhost-user: iommu: postpone device creation until ring are mapped > vhost: iommu: Invalidate vring in case of matching IOTLB invalidate > vhost: enable IOMMU support > > doc/guides/rel_notes/release_17_11.rst | 4 + > lib/librte_vhost/Makefile | 4 +- > lib/librte_vhost/iotlb.c | 352 > +++++++++++++++++++++++++++++++++ > lib/librte_vhost/iotlb.h | 76 +++++++ > lib/librte_vhost/vhost.c | 115 +++++++++-- > lib/librte_vhost/vhost.h | 61 +++++- > lib/librte_vhost/vhost_user.c | 312 +++++++++++++++++++++++++---- > lib/librte_vhost/vhost_user.h | 20 +- > lib/librte_vhost/virtio_net.c | 96 +++++++-- > 9 files changed, 962 insertions(+), 78 deletions(-) > create mode 100644 lib/librte_vhost/iotlb.c > create mode 100644 lib/librte_vhost/iotlb.h > > -- > 2.13.6