"Burakov, Anatoly" <[email protected]> wrote on 08/08/2017 11:43:43 AM:
> From: "Burakov, Anatoly" <[email protected]> > To: Jonas Pfefferle1 <[email protected]> > Cc: "[email protected]" <[email protected]>, "[email protected]" <[email protected]> > Date: 08/08/2017 11:43 AM > Subject: RE: [PATCH v3] vfio: fix sPAPR IOMMU DMA window size > > > From: Jonas Pfefferle1 [mailto:[email protected]] > > Sent: Tuesday, August 8, 2017 10:30 AM > > To: Burakov, Anatoly <[email protected]> > > Cc: [email protected]; [email protected] > > Subject: RE: [PATCH v3] vfio: fix sPAPR IOMMU DMA window size > > > > "Burakov, Anatoly" <[email protected]> wrote on 08/08/2017 > > 11:15:24 AM: > > > > > From: "Burakov, Anatoly" <[email protected]> > > > To: Jonas Pfefferle <[email protected]> > > > Cc: "[email protected]" <[email protected]>, "[email protected]" <[email protected]> > > > Date: 08/08/2017 11:18 AM > > > Subject: RE: [PATCH v3] vfio: fix sPAPR IOMMU DMA window size > > > > > > From: Jonas Pfefferle [mailto:[email protected]] > > > > Sent: Tuesday, August 8, 2017 9:41 AM > > > > To: Burakov, Anatoly <[email protected]> > > > > Cc: [email protected]; [email protected]; Jonas Pfefferle <[email protected]> > > > > Subject: [PATCH v3] vfio: fix sPAPR IOMMU DMA window size > > > > > > > > DMA window size needs to be big enough to span all memory segment's > > > > physical addresses. We do not need multiple levels of IOMMU tables > > > > as we already span ~70TB of physical memory with 16MB hugepages. > > > > > > > > Signed-off-by: Jonas Pfefferle <[email protected]> > > > > --- > > > > v2: > > > > * roundup to next power 2 function without loop. > > > > > > > > v3: > > > > * Replace roundup_next_pow2 with rte_align64pow2 > > > > > > > > lib/librte_eal/linuxapp/eal/eal_vfio.c | 13 ++++++++++--- > > > > 1 file changed, 10 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c > > > > b/lib/librte_eal/linuxapp/eal/eal_vfio.c > > > > index 946df7e..550c41c 100644 > > > > --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c > > > > +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c > > > > @@ -759,10 +759,12 @@ vfio_spapr_dma_map(int vfio_container_fd) > > > > return -1; > > > > } > > > > > > > > - /* calculate window size based on number of hugepages configured > > > > */ > > > > - create.window_size = rte_eal_get_physmem_size(); > > > > + /* physicaly pages are sorted descending i.e. ms[0].phys_addr is max > > > > */ > > > > > > Do we always expect that to be the case in the future? Maybe it > > > would be safer to walk the memsegs list. > > > > > > Thanks, > > > Anatoly > > > > I had this loop in before but removed it in favor of simplicity. > > If we believe that the ordering is going to change in the future > > I'm happy to bring back the loop. Is there other code which is > > relying on the fact that the memsegs are sorted by their physical > > addresses? > > I don't think there is. In any case, I think making assumptions > about particulars of memseg organization is not a very good practice. > > I seem to recall us doing similar things in other places, so maybe > down the line we could introduce a new API (or internal-only) > function to get a memseg with min/max address. For now I think a > loop will do. Ok. Makes sense to me. Let me resubmit a new version with the loop. > > > > > > > > > > + /* create DMA window from 0 to max(phys_addr + len) */ > > > > + /* sPAPR requires window size to be a power of 2 */ > > > > + create.window_size = rte_align64pow2(ms[0].phys_addr + > > > > ms[0].len); > > > > create.page_shift = __builtin_ctzll(ms->hugepage_sz); > > > > - create.levels = 2; > > > > + create.levels = 1; > > > > > > > > ret = ioctl(vfio_container_fd, VFIO_IOMMU_SPAPR_TCE_CREATE, > > > > &create); > > > > if (ret) { > > > > @@ -771,6 +773,11 @@ vfio_spapr_dma_map(int vfio_container_fd) > > > > return -1; > > > > } > > > > > > > > + if (create.start_addr != 0) { > > > > + RTE_LOG(ERR, EAL, " DMA window start address != 0\n"); > > > > + return -1; > > > > + } > > > > + > > > > /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */ > > > > for (i = 0; i < RTE_MAX_MEMSEG; i++) { > > > > struct vfio_iommu_type1_dma_map dma_map; > > > > -- > > > > 2.7.4 > > > >

