15/06/2021 04:49, Xia, Chenbo:
> From: Thomas Monjalon <tho...@monjalon.net>
> > 01/06/2021 05:06, Chenbo Xia:
> > > Hi everyone,
> > >
> > > This is a draft implementation of the mdev (Mediated device [1])
> > > support in DPDK PCI bus driver. Mdev is a way to virtualize devices
> > > in Linux kernel. Based on the device-api (mdev_type/device_api),
> > > there could be different types of mdev devices (e.g. vfio-pci).
> > 
> > Please could you illustrate with an usage of mdev in DPDK?
> > What does it enable which is not possible today?
> 
> The main purpose is for DPDK to drive mdev-based devices, which is not
> possible today.
> 
> I'd take PCI devices for an example. Currently DPDK can only drive devices
> of physical pci bus under /sys/bus/pci and kernel exposes the pci devices
> to APP in that way.
> 
> But there are PCI devices using vfio-mdev as a software framework to expose
> Mdev to APP under /sys/bus/mdev. Devices could choose this way of virtualizing
> itself to let multiple APPs share one physical device. For example, Intel
> Scalable IOV technology is known to use vfio-mdev as SW framework for Scalable
> IOV enabled devices (and Intel net/crypto/raw devices support this tech). For
> those mdev-based devices, DPDK needs support on the bus layer to 
> scan/plug/probe/..
> them, which is the main effort this patchset does. There are also other 
> devices
> using the vfio-mdev framework, AFAIK, Nvidia's GPU is the first one using mdev
> and Intel's GPU virtualization also uses it.

Yes mdev was designed for virtualization I think.
The use of mdev for Scalable IOV without virtualization
may be seen as an abuse by Linux maintainers,
as they currently seem to prefer the auxiliary bus (which is a real bus).

Mellanox got a push back when trying to use mdev for the same purpose
(Scalable Function, also called Sub-Function) in the kernel.
The Linux community decided to use the auxiliary bus.

Any other feedback on the choice mdev vs aux?
Is there any kernel code supporting this mdev model for Intel devices?

> > > In this patchset, the PCI bus driver is extended to support scanning
> > > and probing the mdev devices whose device-api is "vfio-pci".
> > >
> > >                      +---------+
> > >                      | PCI bus |
> > >                      +----+----+
> > >                           |
> > >          +--------+-------+-------+--------+
> > >          |        |               |        |
> > >   Physical PCI devices ...   Mediated PCI devices ...
> > >
> > > The first four patches in this patchset are mainly preparation of mdev
> > > bus support. The left two patches are the key implementation of mdev bus.
> > >
> > > The implementation of mdev bus in DPDK has several options:
> > >
> > > 1: Embed mdev bus in current pci bus
> > >
> > >    This patchset takes this option for an example. Mdev has several
> > >    device types: pci/platform/amba/ccw/ap. DPDK currently only cares
> > >    pci devices in all mdev device types so we could embed the mdev bus
> > >    into current pci bus. Then pci bus with mdev support will scan/plug/
> > >    unplug/.. not only normal pci devices but also mediated pci devices.
> > 
> > I think it is a different bus.
> > It would be cleaner to not touch the PCI bus.
> > Having a separate bus will allow an easy way to identify a device
> > with the new generic devargs syntax, example:
> >     bus=mdev,uuid=XXX
> > or more complex:
> >     bus=mdev,uuid=XXX/class=crypto/driver=qat,foo=bar
> 
> OK. Agree on cleaner to not touch PCI bus. And there may also be a 'type=pci'
> as mdev has several types in its definition (pci/ap/platform/ccw/...).
> 
> > > 2: A new mdev bus that scans mediated pci devices and probes mdev driver 
> > > to
> > >    plug-in pci devices to pci bus
> > >
> > >    If we took this option, a new mdev bus will be implemented to scan
> > >    mediated pci devices and a new mdev driver for pci devices will be
> > >    implemented in pci bus to plug-in mediated pci devices to pci bus.
> > >
> > >    Our RFC v1 takes this option:
> > >    http://patchwork.dpdk.org/project/dpdk/cover/20190403071844.21126-1-
> > tiwei....@intel.com/
> > >
> > >    Note that: for either option 1 or 2, device drivers do not know the
> > >    implementation difference but only use structs/functions exposed by
> > >    pci bus. Mediated pci devices are different from normal pci devices
> > >    on: 1. Mediated pci devices use UUID as address but normal ones use 
> > > BDF.
> > >    2. Mediated pci devices may have some capabilities that normal pci
> > >    devices do not have. For example, mediated pci devices could have
> > >    regions that have sparse mmap capability, which allows a region to have
> > >    multiple mmap areas. Another example is mediated pci devices may have
> > >    regions/part of regions not mmaped but need to access them. Above
> > >    difference will change the current ABI (i.e., struct rte_pci_device).
> > >    Please check 5th and 6th patch for details.
> > >
> > > 3. A brand new mdev bus that does everything
> > >
> > >    This option will implement a new and standalone mdev bus. This option
> > >    does not need any changes in current pci bus but only needs some shared
> > >    code (linux vfio part) in pci bus. Drivers of devices that support mdev
> > >    will register itself as a mdev driver and do not rely on pci bus 
> > > anymore.
> > >    This option, IMHO, will make the code clean. The only potential problem
> > >    may be code duplication, which could be solved by making code of linux
> > >    vfio part of pci bus common and shared.
> > 
> > Yes I prefer this third option.
> > We can find an elegant way of sharing some VFIO code between buses.
> 
> Yes, I have not thought about the details of the code sharing but will try to 
> make
> it elegant.

Great, thanks.


Reply via email to