On 21/04/17 05:16, gowrishankar muthukrishnan wrote: > On Thursday 20 April 2017 07:52 PM, Alexey Kardashevskiy wrote: >> On 20/04/17 23:25, Alexey Kardashevskiy wrote: >>> On 20/04/17 19:04, Jonas Pfefferle1 wrote: >>>> Alexey Kardashevskiy <a...@ozlabs.ru> wrote on 20/04/2017 09:24:02: >>>> >>>>> From: Alexey Kardashevskiy <a...@ozlabs.ru> >>>>> To: dev@dpdk.org >>>>> Cc: Alexey Kardashevskiy <a...@ozlabs.ru>, j...@zurich.ibm.com, >>>>> Gowrishankar Muthukrishnan <gowrishanka...@in.ibm.com> >>>>> Date: 20/04/2017 09:24 >>>>> Subject: [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus >>>>> addresses for DMA map >>>>> >>>>> VFIO_IOMMU_SPAPR_TCE_CREATE ioctl() returns the actual bus address for >>>>> just created DMA window. It happens to start from zero because the >>>>> default >>>>> window is removed (leaving no windows) and new window starts from zero. >>>>> However this is not guaranteed and the new window may start from another >>>>> address, this adds an error check. >>>>> >>>>> Another issue is that IOVA passed to VFIO_IOMMU_MAP_DMA should be a PCI >>>>> bus address while in this case a physical address of a user page is used. >>>>> This changes IOVA to start from zero in a hope that the rest of DPDK >>>>> expects this. >>>> This is not the case. DPDK expects a 1:1 mapping PA==IOVA. It will use the >>>> phys_addr of the memory segment it got from /proc/self/pagemap cf. >>>> librte_eal/linuxapp/eal/eal_memory.c. We could try setting it here to the >>>> actual iova which basically makes the whole virtual to phyiscal mapping >>>> with pagemap unnecessary which I believe should be the case for VFIO >>>> anyway. Pagemap should only be needed when using pci_uio. >>> >>> Ah, ok, makes sense now. But it sure needs a big fat comment there as it is >>> not obvious why host RAM address is used there as DMA window start is not >>> guaranteed. >> Well, either way there is some bug - ms[i].phys_addr and ms[i].addr_64 both >> have exact same value, in my setup it is 3fffb33c0000 which is a userspace >> address - at least ms[i].phys_addr must be physical address. > > This patch breaks i40e_dev_init() in my server. > > EAL: PCI device 0004:01:00.0 on NUMA socket 1 > EAL: probe driver: 8086:1583 net_i40e > EAL: using IOMMU type 7 (sPAPR) > eth_i40e_dev_init(): Failed to init adminq: -32 > EAL: Releasing pci mapped resource for 0004:01:00.0 > EAL: Calling pci_unmap_resource for 0004:01:00.0 at 0x3fff82aa0000 > EAL: Requested device 0004:01:00.0 cannot be used > EAL: PCI device 0004:01:00.1 on NUMA socket 1 > EAL: probe driver: 8086:1583 net_i40e > EAL: using IOMMU type 7 (sPAPR) > eth_i40e_dev_init(): Failed to init adminq: -32 > EAL: Releasing pci mapped resource for 0004:01:00.1 > EAL: Calling pci_unmap_resource for 0004:01:00.1 at 0x3fff82aa0000 > EAL: Requested device 0004:01:00.1 cannot be used > EAL: No probed ethernet devices > > I have two memseg each of 1G size. Their mapped PA and VA are also different. > > (gdb) p /x ms[0] > $3 = {phys_addr = 0x1e0b000000, {addr = 0x3effaf000000, addr_64 = > 0x3effaf000000}, > len = 0x40000000, hugepage_sz = 0x1000000, socket_id = 0x1, nchannel = > 0x0, nrank = 0x0} > (gdb) p /x ms[1] > $4 = {phys_addr = 0xf6d000000, {addr = 0x3efbaf000000, addr_64 = > 0x3efbaf000000}, > len = 0x40000000, hugepage_sz = 0x1000000, socket_id = 0x0, nchannel = > 0x0, nrank = 0x0} > > Could you please recheck this. May be, if new DMA window does not start > from bus address 0, > only then you reset dma_map.iova for this offset ?
As we figured out, it is --no-huge effect. Another thing - as I read the code - the window size comes from rte_eal_get_physmem_size(). On my 512GB machine, DPDK allocates only 16GB window so it is far away from 1:1 mapping which is believed to be DPDK expectation. Looking now for a better version of rte_eal_get_physmem_size()... And another problem - after few unsuccessful starts of app/testpmd, all huge pages are gone: aik@stratton2:~$ cat /proc/meminfo MemTotal: 535527296 kB MemFree: 516662272 kB MemAvailable: 515501696 kB ... HugePages_Total: 1024 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 16384 kB How is that possible? What is pinning these pages so testpmd process exit does not clear that up? > > > Thanks, > Gowrishankar > >> >>> >>>>> Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru> >>>>> --- >>>>> lib/librte_eal/linuxapp/eal/eal_vfio.c | 12 ++++++++++-- >>>>> 1 file changed, 10 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/ >>>>> librte_eal/linuxapp/eal/eal_vfio.c >>>>> index 46f951f4d..8b8e75c4f 100644 >>>>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c >>>>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c >>>>> @@ -658,7 +658,7 @@ vfio_spapr_dma_map(int vfio_container_fd) >>>>> { >>>>> const struct rte_memseg *ms = rte_eal_get_physmem_layout(); >>>>> int i, ret; >>>>> - >>>>> + phys_addr_t io_offset; >>>>> struct vfio_iommu_spapr_register_memory reg = { >>>>> .argsz = sizeof(reg), >>>>> .flags = 0 >>>>> @@ -702,6 +702,13 @@ vfio_spapr_dma_map(int vfio_container_fd) >>>>> return -1; >>>>> } >>>>> + io_offset = create.start_addr; >>>>> + if (io_offset) { >>>>> + RTE_LOG(ERR, EAL, " DMA offsets other than zero is not >>>>> supported, " >>>>> + "new window is created at %lx\n", io_offset); >>>>> + return -1; >>>>> + } >>>>> + >>>>> /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */ >>>>> for (i = 0; i < RTE_MAX_MEMSEG; i++) { >>>>> struct vfio_iommu_type1_dma_map dma_map; >>>>> @@ -723,7 +730,7 @@ vfio_spapr_dma_map(int vfio_container_fd) >>>>> dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map); >>>>> dma_map.vaddr = ms[i].addr_64; >>>>> dma_map.size = ms[i].len; >>>>> - dma_map.iova = ms[i].phys_addr; >>>>> + dma_map.iova = io_offset; >>>>> dma_map.flags = VFIO_DMA_MAP_FLAG_READ | >>>>> VFIO_DMA_MAP_FLAG_WRITE; >>>>> @@ -735,6 +742,7 @@ vfio_spapr_dma_map(int vfio_container_fd) >>>>> return -1; >>>>> } >>>>> + io_offset += dma_map.size; >>>>> } >>>>> return 0; >>>>> -- >>>>> 2.11.0 >>>>> >>> >> > > -- Alexey