On Fri, 2015-10-16 at 11:36 +0300, Michael S. Tsirkin wrote: > On Wed, Oct 14, 2015 at 05:51:17PM +0200, Knut Omang wrote: > > Add a small intro + minimal documentation for how to > > implement SR/IOV support for an emulated device. > > I worry that we won't keep this up to date as > code changes. Could some or all of this go into > comments in relevant headers?
Hopefully the documented part is not going to change that much - it is not accurate code anyway, just pseudo code to aid in developing new devices using it. Marcel's idea to write a small doc seemed like a good idea now. We could always remove it when a full example driver has been written. > > > Signed-off-by: Knut Omang <knut.om...@oracle.com> > > --- > > docs/pcie_sriov.txt | 115 > > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 115 insertions(+) > > create mode 100644 docs/pcie_sriov.txt > > > > diff --git a/docs/pcie_sriov.txt b/docs/pcie_sriov.txt > > new file mode 100644 > > index 0000000..f5e891e > > --- /dev/null > > +++ b/docs/pcie_sriov.txt > > @@ -0,0 +1,115 @@ > > +PCI SR/IOV EMULATION SUPPORT > > +============================ > > + > > +Description > > +=========== > > +SR/IOV (Single Root I/O Virtualization) is an optional extended > > capability > > +of a PCI Express device. It allows a single physical function (PF) > > to appear as multiple > > +virtual functions (VFs) for the main purpose of eliminating > > software > > +overhead in I/O from virtual machines. > > + > > +Qemu now implements the basic common functionality to enable an > > emulated device > > +to support SR/IOV. Yet no fully implemented devices exists in > > Qemu, but a > > +proof-of-concept hack of the Intel igb can be found here: > > + > > +git://github.com/knuto/qemu.git sriov_patches_v5 > > That branch does not seem to be there. > I don't think we should put such short-lived links into > repository. Sorry, I just forgot to push it, it's there now. I'll make sure it stays valid for as long as the reference in the doc is there. Hopefully this is a temporary way to be cleaned up once a working example has been implemented, But feel free to just skip this patch, the most important is to get the generic SR/IOV code in there for others to use. Thanks, Knut > > + > > +Implementation > > +============== > > +Implementing emulation of an SR/IOV capable device typically > > consists of > > +implementing support for two types of device classes; the "normal" > > physical device > > +(PF) and the virtual device (VF). From Qemu's perspective, the VFs > > are just > > +like other devices, except that some of their properties are > > derived from > > +the PF. > > + > > +A virtual function is different from a physical function in that > > the BAR > > +space for all VFs are defined by the BAR registers in the PFs > > SR/IOV > > +capability. All VFs have the same BARs and BAR sizes. > > + > > +Accesses to these virtual BARs then is computed as > > + > > + <VF BAR start> + <VF number> * <BAR sz> + <offset> > > + > > +From our emulation perspective this means that there is a separate > > call for > > +setting up a BAR for a VF. > > + > > +1) To enable SR/IOV support in the PF, it must be a PCI Express > > device so > > + you would need to add a PCI Express capability in the normal > > PCI > > + capability list. You might also want to add an ARI (Alternative > > + Routing-ID Interpretation) capability to indicate that your > > device > > + supports functions beyond it's "own" function space (0-7), > > + which is necessary to support more than 7 functions, or > > + if functions extends beyond offset 7 because they are placed at > > an > > + offset > 1 or have stride > 1. > > + > > + ... > > + #include "hw/pci/pcie.h" > > + #include "hw/pci/pcie_sriov.h" > > + > > + pci_your_pf_dev_realize( ... ) > > + { > > + ... > > + int ret = pcie_endpoint_cap_init(d, 0x70); > > + ... > > + pcie_ari_init(d, 0x100, 1); > > + ... > > + > > + /* Add and initialize the SR/IOV capability */ > > + pcie_sriov_pf_init(d, 0x200, "your_virtual_dev", > > + vf_devid, initial_vfs, total_vfs, > > + fun_offset, stride); > > + > > + /* Set up individual VF BARs (parameters as for normal BARs) > > */ > > + pcie_sriov_pf_init_vf_bar( ... ) > > + ... > > + } > > + > > + For cleanup, you simply call: > > + > > + pcie_sriov_pf_exit(device); > > + > > + which will delete all the virtual functions and associated > > resources. > > + > > +2) Similarly in the implementation of the virtual function, you > > need to > > + make it a PCI Express device and add a similar set of > > capabilities > > + except for the SR/IOV capability. Then you need to set up the > > VF BARs as > > + subregions of the PFs SR/IOV VF BARs by calling > > + pcie_sriov_vf_register_bar() instead of the normal > > pci_register_bar() call: > > + > > + pci_your_vf_dev_realize( ... ) > > + { > > + ... > > + int ret = pcie_endpoint_cap_init(d, 0x60); > > + ... > > + pcie_ari_init(d, 0x100, 1); > > + ... > > + memory_region_init(mr, ... ) > > + pcie_sriov_vf_register_bar(d, bar_nr, mr); > > + ... > > + } > > + > > +Testing on Linux guest > > +====================== > > +The easiest is if your device driver supports sysfs based SR/IOV > > +enabling. Support for this was added in kernel v.3.8, so not all > > drivers > > +support it yet. > > + > > +To enable 4 VFs for a device at 01:00.0: > > + > > + modprobe yourdriver > > + echo 4 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs > > + > > +You should now see 4 VFs with lspci. > > +To turn SR/IOV off again - the standard requires you to turn it > > off before you can enable > > +another VF count, and the emulation enforces this: > > + > > + echo 0 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs > > + > > +Older drivers typically provide a max_vfs module parameter > > +to enable it at load time: > > + > > + modprobe yourdriver max_vfs=4 > > + > > +To disable the VFs again then, you simply have to unload the > > driver: > > + > > + rmmod yourdriver