On Tuesday 04 July 2017 03:40 PM, Thomas Monjalon wrote: > Hi Santosh, > Let's try to make this proposal clearer in order to have some reviews. > > 08/06/2017 13:05, Santosh Shukla: >> Q) Why do we need such infrastructure? >> >> A) Some NPU hardware like OCTEONTX follows push model to get the packet >> from the pktio device. Where packet allocation and freeing done >> by the HW. Since HW can operate only on IOVA with help of SMMU/IOMMU, > Some readers may not know IOVA: IO Virtual address. > Some explanations: > https://www.kernel.org/doc/Documentation/Intel-IOMMU.txt > http://vfio.blogspot.fr/2014/08/iommu-groups-inside-and-out.html > > It must be said that SMMU is equivalent to IOMMU for ARM: > > https://developer.arm.com/products/system-ip/system-controllers/system-memory-management-unit > >> when packet receives from the Ethernet device, it is the IOVA address >> (which is PA in existing scheme). > You mean that we are currently using only Physical Address (PA)?
Yes. DPDK default approach is iova=pa. Refer [1], latest example [2]. [1] http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal_vfio.c#n709 [2] http://dpdk.org/browse/dpdk/tree/drivers/bus/fslmc/fslmc_vfio.c#n231 >> Mapping IOVA as PA is expensive on those HW, where every packet >> needs to be converted to VA from PA/IOVA. > Please, could you explain how and where addresses are converted currently? HW(iommu/smmu) does. VFIO case for example: user could program vfio(s) dma_map.iova as _pa or _va. And below api does address translation in dpdk: rte_mem_virt2phy rte_malloc_virt2phy rte_mempool_virt2phy. > >> This patchset proposes the method to autodetect the preferred >> IOVA mode for a device. Summary of IOVA scheme: >> - If all the devices are iommu capable and support IOMMU >> capable driver then selects IOVA_VA. >> - If any of the devices are non-iommu then use default IOVA >> scheme ie. IOVA_PA. >> - If no device found then IOVA scheme would be >> IOVA_DC (Don't care). > I think you should better describe these modes and how they behave. Aren't they self explanatory? meaning 0) If I program my dma device (of-course, iommu-backed-dma-device) as iova = va, then expect dma address (iova) a _va. 1) If I program my dma device (noiommu, e.g. vfio-noiommu or igb_uio case) as iova=pa, then expect _pa. 2) If I program my dma device (+iommu-backed) as iova = pa then expect dma address as _pa. above described approach tested and works for both x86 and arm64. The default scheme for iova mapping is iova=pa. And framework allows user to explicitly override any scheme via --iova-mode=<>. Thanks. >> To achieve that, two global APIs introduced: >> - rte_bus_get_iommu_class >> - rte_pci_get_iommu_class >> >> Return values for those APIs are: >> enum rte_iova_mod { >> RTE_IOVA_DC, /* Don't care */ >> RTE_IOVA_PA, >> RTE_IOVA_VA >> } >> >> Those are the bus policy for selecting IOVA mode. In case user >> want to override bus IOVA mapping then added an EAL option >> "--iova-mode=<string>". User to pass string format 'pa' --> IOVA_PA, >> 'va' --> IOVA_VA. >> >> To support new eal option, adding global API: >> - rte_eal_iova_mode >> >> Patch Summary: >> 2) 1st - 2th patch: Adds infrastructure in linuxapp and bsdapp >> layer. >> 1) 3rd patch: Introduces global bus api named rte_bus_get_iommu_class. >> 3) 4th patch: Add new eal option called --iova-mode=<mode-string>. >> 4) 5th - 6th patch: Logic to detect iova scheme. >> 5) 9th patch: Check IOVA mode before programing vfio dma_map.iova. >> Default scheme is IOVA_PA. >> 6) 10th-12th patch: Check for IOVA_VA mode in below APIs >> - rte_mem_virt2phy >> - rte_mempool_virt2phy >> - rte_malloc_virt2phy >> If set then return paddr=vaddr, else return value from default >> implementation.