Hi Thomas, > -----Original Message----- > From: Thomas Monjalon <tho...@monjalon.net> > Sent: Tuesday, June 15, 2021 3:48 PM > To: Xia, Chenbo <chenbo....@intel.com> > Cc: dev@dpdk.org; Liang, Cunming <cunming.li...@intel.com>; Wu, Jingjing > <jingjing...@intel.com>; Burakov, Anatoly <anatoly.bura...@intel.com>; Yigit, > Ferruh <ferruh.yi...@intel.com>; m...@ashroe.eu; nhor...@tuxdriver.com; > Richardson, Bruce <bruce.richard...@intel.com>; david.march...@redhat.com; > step...@networkplumber.org; Ananyev, Konstantin > <konstantin.anan...@intel.com>; > j...@nvidia.com; pa...@nvidia.com; xuemi...@nvidia.com > Subject: Re: [dpdk-dev] [RFC v3 0/6] Add mdev (Mediated device) support in > DPDK > > 15/06/2021 04:49, Xia, Chenbo: > > From: Thomas Monjalon <tho...@monjalon.net> > > > 01/06/2021 05:06, Chenbo Xia: > > > > Hi everyone, > > > > > > > > This is a draft implementation of the mdev (Mediated device [1]) > > > > support in DPDK PCI bus driver. Mdev is a way to virtualize devices > > > > in Linux kernel. Based on the device-api (mdev_type/device_api), > > > > there could be different types of mdev devices (e.g. vfio-pci). > > > > > > Please could you illustrate with an usage of mdev in DPDK? > > > What does it enable which is not possible today? > > > > The main purpose is for DPDK to drive mdev-based devices, which is not > > possible today. > > > > I'd take PCI devices for an example. Currently DPDK can only drive devices > > of physical pci bus under /sys/bus/pci and kernel exposes the pci devices > > to APP in that way. > > > > But there are PCI devices using vfio-mdev as a software framework to expose > > Mdev to APP under /sys/bus/mdev. Devices could choose this way of > virtualizing > > itself to let multiple APPs share one physical device. For example, Intel > > Scalable IOV technology is known to use vfio-mdev as SW framework for > Scalable > > IOV enabled devices (and Intel net/crypto/raw devices support this tech). > For > > those mdev-based devices, DPDK needs support on the bus layer to > scan/plug/probe/.. > > them, which is the main effort this patchset does. There are also other > devices > > using the vfio-mdev framework, AFAIK, Nvidia's GPU is the first one using > mdev > > and Intel's GPU virtualization also uses it. > > Yes mdev was designed for virtualization I think. > The use of mdev for Scalable IOV without virtualization > may be seen as an abuse by Linux maintainers, > as they currently seem to prefer the auxiliary bus (which is a real bus). > > Mellanox got a push back when trying to use mdev for the same purpose > (Scalable Function, also called Sub-Function) in the kernel. > The Linux community decided to use the auxiliary bus. > > Any other feedback on the choice mdev vs aux?
OK. Thanks for the info. Much appreciated. I could investigate a bit about the choice and later come back to you. > Is there any kernel code supporting this mdev model for Intel devices? Now there's only intel GPU. But I think you care more about devices that DPDK could drive: a dma device (DPDK's name ioat under raw/ioat) is on its way upstreaming (https://www.spinics.net/lists/kvm/msg244417.html) Thanks, Chenbo > > > > > In this patchset, the PCI bus driver is extended to support scanning > > > > and probing the mdev devices whose device-api is "vfio-pci". > > > > > > > > +---------+ > > > > | PCI bus | > > > > +----+----+ > > > > | > > > > +--------+-------+-------+--------+ > > > > | | | | > > > > Physical PCI devices ... Mediated PCI devices ... > > > > > > > > The first four patches in this patchset are mainly preparation of mdev > > > > bus support. The left two patches are the key implementation of mdev > > > > bus. > > > > > > > > The implementation of mdev bus in DPDK has several options: > > > > > > > > 1: Embed mdev bus in current pci bus > > > > > > > > This patchset takes this option for an example. Mdev has several > > > > device types: pci/platform/amba/ccw/ap. DPDK currently only cares > > > > pci devices in all mdev device types so we could embed the mdev bus > > > > into current pci bus. Then pci bus with mdev support will scan/plug/ > > > > unplug/.. not only normal pci devices but also mediated pci devices. > > > > > > I think it is a different bus. > > > It would be cleaner to not touch the PCI bus. > > > Having a separate bus will allow an easy way to identify a device > > > with the new generic devargs syntax, example: > > > bus=mdev,uuid=XXX > > > or more complex: > > > bus=mdev,uuid=XXX/class=crypto/driver=qat,foo=bar > > > > OK. Agree on cleaner to not touch PCI bus. And there may also be a > 'type=pci' > > as mdev has several types in its definition (pci/ap/platform/ccw/...). > > > > > > 2: A new mdev bus that scans mediated pci devices and probes mdev driver > to > > > > plug-in pci devices to pci bus > > > > > > > > If we took this option, a new mdev bus will be implemented to scan > > > > mediated pci devices and a new mdev driver for pci devices will be > > > > implemented in pci bus to plug-in mediated pci devices to pci bus. > > > > > > > > Our RFC v1 takes this option: > > > > http://patchwork.dpdk.org/project/dpdk/cover/20190403071844.21126-1- > > > tiwei....@intel.com/ > > > > > > > > Note that: for either option 1 or 2, device drivers do not know the > > > > implementation difference but only use structs/functions exposed by > > > > pci bus. Mediated pci devices are different from normal pci devices > > > > on: 1. Mediated pci devices use UUID as address but normal ones use > BDF. > > > > 2. Mediated pci devices may have some capabilities that normal pci > > > > devices do not have. For example, mediated pci devices could have > > > > regions that have sparse mmap capability, which allows a region to > have > > > > multiple mmap areas. Another example is mediated pci devices may have > > > > regions/part of regions not mmaped but need to access them. Above > > > > difference will change the current ABI (i.e., struct rte_pci_device). > > > > Please check 5th and 6th patch for details. > > > > > > > > 3. A brand new mdev bus that does everything > > > > > > > > This option will implement a new and standalone mdev bus. This option > > > > does not need any changes in current pci bus but only needs some > shared > > > > code (linux vfio part) in pci bus. Drivers of devices that support > mdev > > > > will register itself as a mdev driver and do not rely on pci bus > anymore. > > > > This option, IMHO, will make the code clean. The only potential > problem > > > > may be code duplication, which could be solved by making code of > linux > > > > vfio part of pci bus common and shared. > > > > > > Yes I prefer this third option. > > > We can find an elegant way of sharing some VFIO code between buses. > > > > Yes, I have not thought about the details of the code sharing but will try > to make > > it elegant. > > Great, thanks. >