For distros, out-of-tree kernel modules are painful. From my POV, it would be preferable to try and find a solution upstream, even if it is going to be difficult and require a lot of negotiation and work.
On Thu, 2019-10-31 at 18:03 +0100, Thomas Monjalon wrote: > We don't get enough attention on this topic. > Let me rephrase the issue and the proposals with more people Cc'ed. > > We are talking about SR-IOV VFs in VMs > with a PF managed on the host by DPDK. > The PF driver is either a (1) bifurcated (Mellanox case), > or (2) bound to UIO with igb_uio, or (3) bound to VFIO. > > In case 1, the PF is still managed by a kernel driver, so no issue. > > In case 2, the PF is managed by UIO. > There is no SR-IOV support in upstream UIO, > but the out-of-tree module igb_uio works. > However we would like to drop this legacy module from DPDK. > Some (most) Linux distributions do not package igb_uio anyway. > The other issue is that igb_uio is using physical addressing, > which is not acceptable with OCTEON TX2 for performance reason. > > In case 3, the PF is managed by VFIO. This is the case we want to > fix. > VFIO does not allow to create VFs. > The workaround is to create VFs before binding the PF to VFIO. > But since Linux 4.19, VFIO forbids any SR-IOV VF management. > There is a security concern about allowing userspace to manage SR-IOV > VF messages and taking the responsibility for VFs in the guest. > > It is desired to allow the system admin deciding the security levels, > by adding a flag in VFIO "let me manage VFs, I know what I am doing". > Reference of "recent" discussion: > https://lkml.org/lkml/2018/3/6/855 > > For now, there is no upstream solution merged. > > This patch is proposing a solution using an out-of-tree module. > In this case, the admin will decide explicitly to bind the PF to > vfio_pf. > Unfortunately this solution won't work in environments which > forbid any out-of-tree module. > Another concern is that it looks like DPDK-only solution. > > We have an issue but we do not want to propose a half-solution > which would harm other projects and users. > So the question is: > Do we accept this patch as a temporary solution? > Or can we get an agreement soon for an upstream kernel solution? > > Thanks for reading and giving your (clear) opinion. > > > 06/09/2019 15:27, Jerin Jacob Kollanukkaran: > > From: Thomas Monjalon < > > tho...@monjalon.net > > > > > > 06/09/2019 11:12, > > > vattun...@marvell.com > > > : > > > > From: Vamsi Attunuru < > > > > vattun...@marvell.com > > > > > > > > > > > > > The DPDK use case such as VF representer or OVS offload etc > > > > would call > > > > for PF and VF PCIe devices to bind vfio-pci module to enable > > > > IOMMU > > > > protection. > > > > > > > > In addition to vSwitch use case, unlike, other PCI class of > > > > devices, > > > > Network class of PCIe devices would have additional > > > > responsibility on > > > > the PF devices such as promiscuous mode support etc. > > > > > > > > The above use cases demand VFIO needs bound to PF and its VF > > > > devices. > > > > This is use case is not supported in Linux kernel, due to a > > > > security > > > > issue where it is possible to have DoS in case if VF attached > > > > to guest > > > > over vfio-pci and netdev kernel driver runs on it and which > > > > something > > > > VF representer would like to enable it. > > > > > > > > Since we can not differentiate, the vfio-pci bounded VF devices > > > > runs > > > > DPDK application or netdev driver in guest, we can not > > > > introduce any > > > > scheme to fix DoS case and therefore not have proper support of > > > > this > > > > in the upstream kernel. > > > > > > > > The igb_uio enables such PF and VF binding support for non- > > > > iommu > > > > devices to make VF representer or OVS offload run on non-iommu > > > > devices > > > > with DoS vulnerability for netdev driver as VF. > > > > > > > > This kernel module, facilitate to enable SRIOV on PF devices, > > > > therefore, to run both PF and VF devices in VFIO mode knowing > > > > its > > > > impacts like igb_uio driver functions of non-iommu devices. > > > > > > > > Signed-off-by: Vamsi Attunuru < > > > > vattun...@marvell.com > > > > > > > > > Signed-off-by: Jerin Jacob < > > > > jer...@marvell.com > > > > > > > > > > > Sorry I fail to properly understand the explanation above. > > > Please try to split in shorter sentences. > > > > > > About the request to add an out-of-tree Linux kernel driver, I > > > guess Jerin is well > > > aware that we don't want such anymore. > > > > Yes. I am aware of it. I don't like the out of tree modules either. > > But, This case, > > I suggested Vamsi to have out of tree module. > > > > Let me describe the issue and let us discuss how to tackle > > the problem: > > > > # Linux kernel wont allow VFIO PF to have SRIOV enable. > > > > Patches and on going discussion are here: > > https://patchwork.kernel.org/patch/10522381/ > > > > https://lwn.net/Articles/748526/ > > > > > > Based on my understanding the reason for NOT allowing the > > VFIO PF to have SRIOV enable is genuine from kernel point of > > View but not from DPDK point of view. > > > > Here is the sequence to describe the problem > > 1) Consider Linux kernel allowed VFIO PCI SRIOV enable > > 2) PF bound to vfio-pci > > 3) using SRIOV infrastructure of vfio-pci PF driver, > > VFs are created > > 4) DPDK application bound to PF and VF, No issue here. > > 5) Assume DPDK application bound to PF and VF bound > > To netdev kernel driver. Now, there is a genuine concern > > From kernel point of view that, DPDK PF can intercept, > > VF mailbox message or so and deny the Kernel request > > Or what if DPDK PF application crashes? > > > > To avoid the case (5), (3) is not allowed in stock kernel. > > Which makes sense IMO. > > > > Now, From DPDK PoV, step 5 is valid as we have > > Rte_flow's VF action etc used to enable such case. > > Where, user can program the PF's rte_flow to steer > > Some traffic to VF, where VF can be, DPDK application or > > Linux kernel netdev driver. > > > > This patch enables the step (3) to enable step (5) from DPDK > > PoV. i.e DPDK needs to allow PF to bind to DPDK with VFs. > > > > Why this issue now: > > - igb_uio kernel driver is used as enabling step (3) > > See store_max_vfs() kernel/linux/igb_uio/igb_uio.c > > This is fine for non-iommu device, IOMMU devices > > needs VFIO. > > - We would like support VFIO for IOMMU protection > > And enable step (5) as DPDK supports form the spec level. > > i.e need to fix feature disparity between iommu vs > > non-iommu based devices. > > > > Note: > > We may not need a brand new kernel module, we could move > > this logic to igb_uio if maintenance is concern. > > > > -- Kind regards, Luca Boccassi