On Thu, Oct 01, 2015 at 07:50:37AM -0700, Stephen Hemminger wrote: > On Thu, 1 Oct 2015 11:33:06 +0300 > "Michael S. Tsirkin" <mst at redhat.com> wrote: > > > On Wed, Sep 30, 2015 at 03:28:58PM -0700, Stephen Hemminger wrote: > > > This driver allows using PCI device with Message Signalled Interrupt > > > from userspace. The API is similar to the igb_uio driver used by the DPDK. > > > Via ioctl it provides a mechanism to map MSI-X interrupts into event > > > file descriptors similar to VFIO. > > > > > > VFIO is a better choice if IOMMU is available, but often userspace drivers > > > have to work in environments where IOMMU support (real or emulated) is > > > not available. All UIO drivers that support DMA are not secure against > > > rogue userspace applications programming DMA hardware to access > > > private memory; this driver is no less secure than existing code. > > > > > > Signed-off-by: Stephen Hemminger <stephen at networkplumber.org> > > > > I don't think copying the igb_uio interface is a good idea. > > What DPDK is doing with igb_uio (and indeed uio_pci_generic) > > is abusing the sysfs BAR access to provide unlimited > > access to hardware. > > > > MSI messages are memory writes so any generic device capable > > of MSI is capable of corrupting kernel memory. > > This means that a bug in userspace will lead to kernel memory corruption > > and crashes. This is something distributions can't support. > > > > uio_pci_generic is already abused like that, mostly > > because when I wrote it, I didn't add enough protections > > against using it with DMA capable devices, > > and we can't go back and break working userspace. > > But at least it does not bind to VFs which all of > > them are capable of DMA. > > > > The result of merging this driver will be userspace abusing the > > sysfs BAR access with VFs as well, and we do not want that. > > > > > > Just forwarding events is not enough to make a valid driver. > > What is missing is a way to access the device in a safe way. > > > > On a more positive note: > > > > What would be a reasonable interface? One that does the following > > in kernel: > > > > 1. initializes device rings (can be in pinned userspace memory, > > but can not be writeable by userspace), brings up interface link > > 2. pins userspace memory (unless using e.g. hugetlbfs) > > 3. gets request, make sure it's valid and belongs to > > the correct task, put it in the ring > > 4. in the reverse direction, notify userspace when buffers > > are available in the ring > > 5. notify userspace about MSI (what this driver does) > > > > What userspace can be allowed to do: > > > > format requests (e.g. transmit, receive) in userspace > > read ring contents > > > > What userspace can't be allowed to do: > > > > access BAR > > write rings > > > > > > This means that the driver can not be a generic one, > > and there will be a system call overhead when you > > write the ring, but that's the price you have to > > pay for ability to run on systems without an IOMMU. > > I think I understand what you are proposing, but it really doesn't > fit into the high speed userspace networking model.
I'm aware of the fact currently the model does everything including bringing up the link in user-space. But there's really no justification for this. Only data path things should be in userspace. A userspace bug should not be able to do things like over-writing the on-device EEPROM. > 1. Device rings are device specific, can't be in a generic driver. So that's more work, and it is not going to happen if people can get by with insecure hacks. > 2. DPDK uses huge mememory. Hugetlbfs? Don't see why this is an issue. Might make things simpler. > 3. Performance requires all ring requests be done in pure userspace, > (ie no syscalls) Make only the TX ring writeable then. At least you won't be able to corrupt the kernel memory. > 4. Ditto, can't have kernel to userspace notification per packet RX ring can be read-only, so userspace can read it directly. -- MST