On Mon, Jun 3, 2019 at 6:44 PM Walker, Benjamin <benjamin.wal...@intel.com> wrote:
> On Mon, 2019-06-03 at 12:48 +0200, David Marchand wrote: > > Hello, > > > > On Thu, May 30, 2019 at 7:48 PM Ben Walker <benjamin.wal...@intel.com> > wrote: > > > In SPDK, not all drivers are registered with DPDK at start up time. > > > Previously, that meant DPDK always chose to set itself up in IOVA_PA > > > mode. Instead, when the correct iova choice is unclear based on the > > > devices and drivers known to DPDK at start up time, use other > heuristics > > > (such as whether /proc/self/pagemap is accessible) to make a better > > > choice. > > > > > > This enables SPDK to run as an unprivileged user again without > requiring > > > users to explicitly set the iova mode on the command line. > > > > > > > Interesting, I got a bz on something similar the day you sent this > patchset ;- > > ) > > > > > > - When a dpdk process is started, either it has access to physical > addresses > > or not, and this won't change for the rest of its life. > > Your fix on defaulting to VA based on a rte_eal_using_phys_addrs() check > makes > > sense to me. > > It is the most encountered situation when running ovs as non root on > recent > > kernels. > > > > > > - However, I fail to see the need for all of this detection code wrt > drivers > > and devices. > > > > On one side of the equation, when dpdk starts, it checks physical address > > availability. > > On the other side of the equation, we have the drivers that will be > invoked > > when probing devices (either at dpdk init, or when hotplugging a device). > > > > At this point, the probing call should check the driver requirement wrt > to the > > kernel driver the device is attached to. > > If this requirement is not fulfilled, then the probing fails. > > > > > > - This leaves the --iova-va forcing option. > > Why do we need it? > > If we don't have access to physical addresses, no choice but run in VA > mode. > > If we have access to physical addresses, the only case would be that you > want > > to downgrade from PA to VA. > > But well, your process can still access it, not sure what the benefit is. > > All of the complexity here, at least as far as I understand it, stems from > supporting hot insert of devices. This is very important to SPDK because > storage > devices get hot inserted all the time, so we very much appreciate that > DPDK has > put in so much effort in this area and continues to accept our patches to > improve it. I know hot insert is not nearly as important for network > devices. > > When DPDK starts up, it needs to select whether to use virtual addresses or > physical addresses in its memory maps. It can do that by answering the > following > questions: > > 1. Does the system only have buses that support an IOMMU? > 2. Is the IOMMU sufficiently fast for the use case? > 3. Will all of the devices that will be used with DPDK throughout the > application's lifetime work with an IOMMU? > > If these three things are true, then the best choice is to use virtual > addresses > in the memory translations. However, if any of the above are not true it > needs > to fall back to physical addresses. > > #1 is checked by simply asking all of the buses, which are known up front. > #2 is > just assumed to be true. But #3 is not possible to check fully because of > hot > insert. > > The code currently approximates the #3 check by looking at the devices > present > at initialization time. If a device exists that's bound to vfio-pci, and no > other devices exist that are bound to a uio driver, and DPDK has a > registered > driver that's actually going to load against the vfio-pci devices, then it > will > elect to use virtual addresses. This is purely a heuristic - it's not a > definitive answer because the user could later hot insert a device that > gets > bound to uio. > > The user, of course, knows the answer to which addressing scheme to use > typically. For example, these checks assume #2 is true, but there may be > hardware implementations where it is not and the user wants to force > physical > addresses. Or the user may know that they are going to hot insert a device > at > run time that doesn't work with the IOMMU. That's why it's important to > maintain > the ability for the user to override the default heuristic's decision via > the > command line. > > My patch series is simply improving the heuristic in a few ways. First, > previously each bus when queried would return either virtual or physical > addresses as its choice. However, often the bus just does not have enough > information to formulate any preference at all (and PCI was defaulting to > physical addresses in this case). Instead, I made it so that the bus can > return > that it doesn't care, which pushes the decision up to a higher level. That > higher level then makes the decision by checking whether it can access > /proc/self/pagemap. Second, I narrowed the uio check such that physical > addresses will only be selected if a device bound to uio exists and there > is a > driver registered to use it. Previously if any device was bound to uio it > would > select physical addresses, even if DPDK never ended up loading against that > device. > > I think these two things make the heuristic choose the right thing more > often, > but it still won't always get it right so the command line option needs to > remain. > > After some exchanges offlist, on irc and taking some time looking at the code, here are my conclusions. Copying bus drivers maintainers/connaisseurs. We have cases where we prefer using VA even if PA are available (for fslmc where translating from iova as PA to VA is more costly). I worked on Ben patches and summarised it as two main issues with the current code: - physical addresses availability is not taken into account early enough in EAL init, and we end up with memory subsystem complaining later which is not that user friendly. A collateral is that the init could have fallen back to using VA in most cases if there were no strong requirement on PA. - pci bus driver looks at all devices on the system, with no consideration on the pci white/blacklist and no consideration on the fact that dpdk has a driver that supports the device I prepared a new series that I will send shortly. I am currently considering the backport potential for it. Thoughts? Else, reviews are welcome. Thanks. -- David Marchand