On Wed, Feb 13, 2019 at 9:11 AM Shahaf Shuler <shah...@mellanox.com> wrote:
> This series is in continue to RFC[1]. > > The DPDK APIs expose 3 different modes to work with memory used for DMA: > > 1. Use the DPDK owned memory (backed by the DPDK provided hugepages). > This memory is allocated by the DPDK libraries, included in the DPDK > memory system (memseg lists) and automatically DMA mapped by the DPDK > layers. > > 2. Use memory allocated by the user and register to the DPDK memory > systems. This is also referred as external memory. Upon registration of > the external memory, the DPDK layers will DMA map it to all needed > devices. > > 3. Use memory allocated by the user and not registered to the DPDK memory > system. This is for users who wants to have tight control on this > memory. The user will need to explicitly call DMA map function in order > to register such memory to the different devices. > > The scope of the patch focus on #3 above. > > Why can not we have case 2 covering case 3? > Currently the only way to map external memory is through VFIO > (rte_vfio_dma_map). While VFIO is common, there are other vendors > which use different ways to map memory (e.g. Mellanox and NXP). > > As you say, VFIO is common, and when allowing DMAs programmed in user space, the right thing to do. I'm assuming there is an IOMMU hardware and this is what Mellanox and NXP rely on in some way or another. Having each driver doing things in their own way will end up in a harder to validate system. If there is an IOMMU hardware, same mechanism should be used always, leaving to the IOMMU hw specific implementation to deal with the details. If a NIC is IOMMU-able, that should not be supported by specific vendor drivers but through a generic solution like VFIO which will validate a device with such capability and to perform the required actions for that case. VFIO and IOMMU should be modified as needed for supporting this requirement instead of leaving vendor drivers to implement their own solution. In any case, I think this support should be in a different patchset than the private user space mappings. > The work in this patch moves the DMA mapping to vendor agnostic APIs. > A new map and unmap ops were added to rte_bus structure. Implementation > of those was done currently only on the PCI bus. The implementation takes > the driver map and umap implementation as bypass to the VFIO mapping. > That is, in case of no specific map/unmap from the PCI driver, > VFIO mapping, if possible, will be used. > > Application use with those APIs is quite simple: > * allocate memory > * take a device, and query its rte_device. > * call the bus map function for this device. > > Future work will deprecate the rte_vfio_dma_map and rte_vfio_dma_unmap > APIs, leaving the PCI device APIs as the preferred option for the user. > > [1] https://patches.dpdk.org/patch/47796/ > > Shahaf Shuler (6): > vfio: allow DMA map of memory for the default vfio fd > vfio: don't fail to DMA map if memory is already mapped > bus: introduce DMA memory mapping for external memory > net/mlx5: refactor external memory registration > net/mlx5: support PCI device DMA map and unmap > doc: deprecate VFIO DMA map APIs > > doc/guides/prog_guide/env_abstraction_layer.rst | 2 +- > doc/guides/rel_notes/deprecation.rst | 4 + > drivers/bus/pci/pci_common.c | 78 +++++++ > drivers/bus/pci/rte_bus_pci.h | 14 ++ > drivers/net/mlx5/mlx5.c | 2 + > drivers/net/mlx5/mlx5_mr.c | 232 ++++++++++++++++--- > drivers/net/mlx5/mlx5_rxtx.h | 5 + > lib/librte_eal/common/eal_common_bus.c | 22 ++ > lib/librte_eal/common/include/rte_bus.h | 57 +++++ > lib/librte_eal/common/include/rte_vfio.h | 12 +- > lib/librte_eal/linuxapp/eal/eal_vfio.c | 26 ++- > lib/librte_eal/rte_eal_version.map | 2 + > 12 files changed, 418 insertions(+), 38 deletions(-) > > -- > 2.12.0 > >