Hi Maxime, > -----Original Message----- > From: Maxime Coquelin <maxime.coque...@redhat.com> > Sent: Tuesday, December 22, 2020 4:49 PM > To: Xia, Chenbo <chenbo....@intel.com>; Thomas Monjalon <tho...@monjalon.net>; > David Marchand <david.march...@redhat.com> > Cc: dev <dev@dpdk.org>; Stephen Hemminger <step...@networkplumber.org>; Liang, > Cunming <cunming.li...@intel.com>; Lu, Xiuchun <xiuchun...@intel.com>; Li, > Miao <miao...@intel.com>; Wu, Jingjing <jingjing...@intel.com> > Subject: Re: [dpdk-dev] [PATCH 0/8] Introduce emudev library and iavf emudev > driver > > Hi Chenbo, > > Thanks for the detailed reply. > > On 12/22/20 4:09 AM, Xia, Chenbo wrote: > > Hi Maxime, > > > >> -----Original Message----- > >> From: Maxime Coquelin <maxime.coque...@redhat.com> > >> Sent: Monday, December 21, 2020 8:02 PM > >> To: Xia, Chenbo <chenbo....@intel.com>; Thomas Monjalon > <tho...@monjalon.net>; > >> David Marchand <david.march...@redhat.com> > >> Cc: dev <dev@dpdk.org>; Stephen Hemminger <step...@networkplumber.org>; > Liang, > >> Cunming <cunming.li...@intel.com>; Lu, Xiuchun <xiuchun...@intel.com>; Li, > >> Miao <miao...@intel.com>; Wu, Jingjing <jingjing...@intel.com> > >> Subject: Re: [dpdk-dev] [PATCH 0/8] Introduce emudev library and iavf > emudev > >> driver > >> > >> > >> > >> On 12/21/20 10:52 AM, Maxime Coquelin wrote: > >>> Hi Chenbo, > >>> > >>> On 12/19/20 7:11 AM, Xia, Chenbo wrote: > >>>> Hi David, > >>>> > >>>>> -----Original Message----- > >>>>> From: David Marchand <david.march...@redhat.com> > >>>>> Sent: Friday, December 18, 2020 5:54 PM > >>>>> To: Xia, Chenbo <chenbo....@intel.com> > >>>>> Cc: dev <dev@dpdk.org>; Thomas Monjalon <tho...@monjalon.net>; Stephen > >>>>> Hemminger <step...@networkplumber.org>; Liang, Cunming > >>>>> <cunming.li...@intel.com>; Lu, Xiuchun <xiuchun...@intel.com>; Li, Miao > >>>>> <miao...@intel.com>; Wu, Jingjing <jingjing...@intel.com> > >>>>> Subject: Re: [PATCH 0/8] Introduce emudev library and iavf emudev driver > >>>>> > >>>>> On Fri, Dec 18, 2020 at 9:02 AM Chenbo Xia <chenbo....@intel.com> wrote: > >>>>>> > >>>>>> This series introduces a new device abstraction called emudev for > >>>>> emulated > >>>>>> devices. A new library (librte_emudev) is implemented. The first emudev > >>>>>> driver is also introduced, which emulates Intel Adaptive Virtual > >>>>> Function > >>>>>> (iavf) as a software network device. > >>>>>> > >>>>>> This series has a dependency on librte_vfio_user patch series: > >>>>>> http://patchwork.dpdk.org/cover/85389/ > >>>>>> > >>>>>> Background & Motivation > >>>>>> ----------------------- > >>>>>> The disaggregated/multi-process QEMU is using VFIO-over-socket/vfio- > user > >>>>>> as the main transport mechanism to disaggregate IO services from QEMU. > >>>>>> Therefore, librte_vfio_user is introduced in DPDK to accommodate > >>>>>> emulated devices for high performance I/O. Although vfio-user library > >>>>>> provides possibility of emulating devices in DPDK, DPDK does not have > >>>>>> a device abstraction for emulated devices. A good device abstraction > >>>>> will > >>>>>> be useful for applications or high performance data path driver. With > >>>>>> this consideration, emudev library is designed and implemented. It also > >>>>>> make it possbile to keep modular design on emulated devices by > >>>>> implementing > >>>>>> data path related logic in a standalone driver (e.g., an ethdev driver) > >>>>>> and keeps the unrelated logic in the emudev driver. > >>>>> > >>>>> Since you mention performance, how does it compare to vhost-user/virtio? > >>>> > >>>> I think it depends on the device specification (i.e., how complex its > data > >> path > >>>> handling is). A first try on iavf spec shows better performance than > virtio > >>>> in our simple tests. > >>> > >>> That's interesting! How big is the performance difference? And how do > >>> we explain it? > >>> > >>> If there are improvements that could be done in the Virtio > >>> specification, it would be great to know and work on their > >>> implementations. It worries me a bit that every one is coming with > >>> his new device emulation every release, making things like live- > >>> migration difficult to achieve in the future. > >> > >> I did a quick review of the IAVF emudev driver to understand what other > >> factors than ring layout could explain a performance gain. > >> > >> My understanding is that part of the performance gain may come from > >> following things that are supported/implemented in Vhost-user backend > >> and not in IAVF driver: > >> 1. Memory hotplug. It seems the datapath is not safe against memory > >> hotplug in the VM, which causes the memory tables to be updated > >> asynchronously from the datapath. In order to support it in Vhost-user > >> library, we had to introduce locks to ensure the datapath isn't > >> accessing the shared memory while it is being remapped. > > > > I think now it uses the similar way that vhost-user does. > > > > First, in the vfio-user patch series, we introduce a callback lock_dp to > lock > > the data path when messages like DMA MAP/UNMAP come. It will lock datapath > > in our data path driver. > > > > Note that the data path handling is in our data path driver: > > http://patchwork.dpdk.org/cover/85500/ > > > > For modular design, iavf_emu driver emulates the device but the iavf back- > end > > driver does data path handling. > > My analysis was actually based on the data path driver series. > > My point was that iavfbe_recv_pkts() and iavfbe_xmit_pkts() are not safe > against asynchronous changes like memory table updates. > > As far as I can see, the access_lock aren't taken by these functions, so > if for example a memory table update happen during the these functions > execution, it could lead to undefined behaviour. Only things checked > there is whether the queue is enabled when entering the function, but > this value can be changed right after having being checked. > > For example, in Vhost-user lib, we protected rte_vhost_dequeue_burst() > and rte_vhost_enqueue_burst() with a spinlock. Note that this spinlock > is per-virtqueue in order to avoid contention between the different > queues. >
Oops, I didn't realize the data path driver missed that. And yes, the data path driver should do that like you said. > >> > >> 2. Live-migration. This feature does not seem supported in the driver, > >> as I could not find dirty pages tracking mechanism. On Vhost-user side, > >> supporting implies adding more branch conditions in the hot path, even > >> when it is not used. > > > > Yes, we don't support this now in this version. And yes, when we support > this > > feature, it will introduce complexity in data path. > > > >> > >> 3. No IOMMU support. Same here, this is supported in Vhost-user library, > >> and adding its support in IAVF driver would introduce some more branches > >> in the hot path even when not used. > > > > Yes, vIOMMU is not fully supported in vfio-user spec for now and I also > agree > > when we have to support it, it will slow down the data path. > > > >> > >> 4. Out of bound memory accesses checks. While > >> rte_iavf_emu_get_dma_vaddr() provides a way to ensure the full requested > >> length is mapped, the data path does not use it. It does not even ensure > > > > In fact, it uses it 😊. I think you may miss our data path driver. Here's a > > use: http://patchwork.dpdk.org/patch/85504/ > > Sorry, I was not clear. What I meant is that > rte_iavf_emu_get_dma_vaddr() is indeed used, but its outputs aren't > checked properly. > > >> the translated address is non-NULL. It makes it trivial for a malicious > >> guest to make the hypervisor's vSwitch to crash by simply passing random > >> values as buffer's address and length. Fixing it is trivial, but it will > >> add several more checks and loops (if a buffer is spanned across two > >> pages) in the hot path. > > > > I don't quite understand this one. First, rte_iavf_emu_get_dma_vaddr() is > the > > only way to translate address. And I think this function will ensure the > input > > address is valid. Looking at the vhost side, vhost_iova_to_vva() does > similar > > things when vIOMMU is not used. Do I miss something? Just correct me if I am > > wrong. > > I think rte_iavf_emu_get_dma_vaddr() is good, the issue is the way it is > used in iavfbe_recv_pkts() and iavfbe_xmit_pkts(). > > First the returned address (desc_addr) is not checked. So if the start > address of the buffer is out of guest memory ranges, the host app will > crash with a segmentation fault. It can happen in case of bug in the > guest driver, or with a maliciously crafted descriptor. > > Then, rte_iavf_emu_get_dma_vaddr() len parameter is a pointer. The > caller has to pass the length of the buffer it wants to map. Once > rte_iavf_emu_get_dma_vaddr() is updated with the contiguous length that > is mapped. > > The caller needs to check whether all requested length is mapped, and > loop until it is all mapped: > http://code.dpdk.org/dpdk/latest/source/lib/librte_vhost/virtio_net.c#L484 > > This is required because a buffer contiguous in IOVA (guest physical > address space in this case) is not necessarily contiguous in Host > virtual address space. Also, this is required to be safe against > untrusted guests, as a malicious guest could pass ~0ULL in the > descriptor buffer len of the Tx path to crash the host application or > overwrite its memory. > > Currently, in the driver, descriptor buf len is passed to the DMA map > function, but its value is not checked afterward, so it only ensure > first byte is mapped. For Tx, 1 bytes length is requested at DMA map, so > also only first byte of the buffer is ensured to be mapped (if desc_addr > is not NULL, which is not checked). > Correct! I also missed that rte_iavf_emu_get_dma_vaddr() is not properly used in data path driver. For the lock issue and this, @Wu, Jingjing could we fix in V2? Thanks for the good catches! Chenbo > >> > >> Other than that, there is for sure a performance gain due to all the > >> features Virtio-net supports that we have to check and handle in the > >> hotpath, like indirect descriptors or mergeable buffers for example. > > > > I think iavf has similar features like indirect and mergeable to recv/xmit > > large pkts. But I believe there will be some feature difference between > > iavf and virtio/vhost. > > Sure, but I meant it might not be necessary to implement all those > features (if they are optional in the spec) in your driver, just that in > the case of Vhost we have a legacy to support. > > > I think you are correct that for this version, it's not fair to compare > virtio/vhost > > with iavf back-end because iavf back-end has not supported some features, > and besides, > > we have not optimized the data path of iavf back-end. We expect the > performance of > > iavf back-end in the same level of virtio 1.1 and hopefully better because > of the ring > > layout. But let's see when we can do complete performance analysis 😊. > > Thanks! > Maxime > > > Thanks! > > Chenbo > > > >> > >> Best regards, > >> Maxime > >> > >>> Regards, > >>> Maxime > >>> > >>>> Thanks! > >>>> Chenbo > >>>> > >>>>> > >>>>> > >>>>> -- > >>>>> David Marchand > >>>> > >