Hi all, I've been digging in to what it would take to run DPDK as an unprivileged user and I have some findings that I thought were worthy of discussion. The assumptions here are that I'm using a very recent Linux kernel (4.8.15 to be specific) and I'm using vfio with my IOMMU enabled. I'm only interested in making it possible to run as an unprivileged user in this type of environment.
There are a few key things that DPDK needs to do in order to run as an unprivileged user: 1) Allocate hugepages 2) Map device resources 3) Map hugepage virtual addresses to DMA addresses. For #1 and #2, DPDK works just fine today. You simply chown the relevant resources in sysfs to the desired user and everything is happy. The problem is #3. This currently relies on looking up the mappings in /proc/self/pagemap, but the ability to get physical addresses in /proc/self/pagemap as an unprivileged user was removed from the kernel in the 4.x timeframe due to the Rowhammer vulnerability. At this time, it is not possible to run DPDK as an unprivileged user on a 4.x Linux kernel. There is a way to make this work though, which I'll outline now. Unfortunately, I think it is going to require some very significant changes to the initialization flow in the EAL. One bit of of background before I go into how to fix this - there are three types of memory addresses - virtual addresses, physical addresses, and DMA addresses. Sometimes DMA addresses are called bus addresses or I/O addresses, but I'll call them DMA addresses because I think that's the clearest name. In a system without an IOMMU, DMA addresses and physical addresses are equivalent, but in a system with an IOMMU any arbitrary DMA address can be chosen by the user to map to a given physical address. For security reasons (rowhammer), it is no longer considered safe to expose physical addresses to userspace, but it is perfectly fine to expose DMA addresses when an IOMMU is present. DPDK today begins by allocating all of the required hugepages, then finds all of the physical addresses for those hugepages using /proc/self/pagemap, sorts the hugepages by physical address, then remaps the pages to contiguous virtual addresses. Later on and if vfio is enabled, it asks vfio to pin the hugepages and to set their DMA addresses in the IOMMU to be the physical addresses discovered earlier. Of course, running as an unprivileged user means all of the physical addresses in /proc/self/pagemap are just 0, so this doesn't end up working. Further, there is no real reason to choose the physical address as the DMA address in the IOMMU - it would be better to just count up starting at 0. Also, because the pages are pinned after the virtual to physical mapping is looked up, there is a window where a page could be moved. Hugepage mappings can be moved on more recent kernels (at least 4.x), and the reliability of hugepages having static mappings decreases with every kernel release. Note that this probably means that using uio on recent kernels is subtly broken and cannot be supported going forward because there is no uio mechanism to pin the memory. The first open question I have is whether DPDK should allow uio at all on recent (4.x) kernels. My current understanding is that there is no way to pin memory and hugepages can now be moved around, so uio would be unsafe. What does the community think here? My second question is whether the user should be allowed to mix uio and vfio usage simultaneously. For vfio, the physical addresses are really DMA addresses and are best when arbitrarily chosen to appear sequential relative to their virtual addresses. For uio, they are physical addresses and are not chosen at all. It seems that these two things are in conflict and that it will be difficult, ugly, and maybe impossible to resolve the simultaneous use of both. Once we agree on the above two things, we can try to talk through some solutions in the code. Thanks, Ben