On Wed, Oct 14, 2020 at 04:07:10PM +0100, Burakov, Anatoly wrote:
> External Email
> 
> ----------------------------------------------------------------------
> On 12-Oct-20 9:11 AM, Nithin Dabilpuram wrote:
> > Partial unmapping is not supported for VFIO IOMMU type1
> > by kernel. Though kernel gives return as zero, the unmapped size
> > returned will not be same as expected. So check for
> > returned unmap size and return error.
> > 
> > For case of DMA map/unmap triggered by heap allocations,
> > maintain granularity of memseg page size so that heap
> > expansion and contraction does not have this issue.
> 
> This is quite unfortunate, because there was a different bug that had to do
> with kernel having a very limited number of mappings available [1], as a
> result of which the page concatenation code was added.
> 
> It should therefore be documented that the dma_entry_limit parameter should
> be adjusted should the user run out of the DMA entries.
> 
> [1] 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_lkml_155414977872.12780.13728555131525362206.stgit-40gimli.home_T_&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=FZ_tPCbgFOh18zwRPO9H0yDx8VW38vuapifdDfc8SFQ&m=3GMg-634_cdUCY4WpQPwjzZ_S4ckuMHOnt2FxyyjXMk&s=TJLzppkaDS95VGyRHX2hzflQfb9XLK0OiOszSXoeXKk&e=

Ack, I'll document it in guides/linux_gsg/linux_drivers.rst in vfio section.
> 
> > 
> > For user requested DMA map/unmap disallow partial unmapping
> > for VFIO type1.
> > 
> > Fixes: 73a639085938 ("vfio: allow to map other memory regions")
> > Cc: anatoly.bura...@intel.com
> > Cc: sta...@dpdk.org
> > 
> > Signed-off-by: Nithin Dabilpuram <ndabilpu...@marvell.com>
> > ---
> >   lib/librte_eal/linux/eal_vfio.c | 34 ++++++++++++++++++++++++++++------
> >   lib/librte_eal/linux/eal_vfio.h |  1 +
> >   2 files changed, 29 insertions(+), 6 deletions(-)
> > 
> > diff --git a/lib/librte_eal/linux/eal_vfio.c 
> > b/lib/librte_eal/linux/eal_vfio.c
> > index d26e164..ef95259 100644
> > --- a/lib/librte_eal/linux/eal_vfio.c
> > +++ b/lib/librte_eal/linux/eal_vfio.c
> > @@ -69,6 +69,7 @@ static const struct vfio_iommu_type iommu_types[] = {
> >     {
> >             .type_id = RTE_VFIO_TYPE1,
> >             .name = "Type 1",
> > +           .partial_unmap = false,
> >             .dma_map_func = &vfio_type1_dma_map,
> >             .dma_user_map_func = &vfio_type1_dma_mem_map
> >     },
> > @@ -76,6 +77,7 @@ static const struct vfio_iommu_type iommu_types[] = {
> >     {
> >             .type_id = RTE_VFIO_SPAPR,
> >             .name = "sPAPR",
> > +           .partial_unmap = true,
> >             .dma_map_func = &vfio_spapr_dma_map,
> >             .dma_user_map_func = &vfio_spapr_dma_mem_map
> >     },
> > @@ -83,6 +85,7 @@ static const struct vfio_iommu_type iommu_types[] = {
> >     {
> >             .type_id = RTE_VFIO_NOIOMMU,
> >             .name = "No-IOMMU",
> > +           .partial_unmap = true,
> >             .dma_map_func = &vfio_noiommu_dma_map,
> >             .dma_user_map_func = &vfio_noiommu_dma_mem_map
> >     },
> > @@ -525,12 +528,19 @@ vfio_mem_event_callback(enum rte_mem_event type, 
> > const void *addr, size_t len,
> >     /* for IOVA as VA mode, no need to care for IOVA addresses */
> >     if (rte_eal_iova_mode() == RTE_IOVA_VA && msl->external == 0) {
> >             uint64_t vfio_va = (uint64_t)(uintptr_t)addr;
> > -           if (type == RTE_MEM_EVENT_ALLOC)
> > -                   vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va,
> > -                                   len, 1);
> > -           else
> > -                   vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va,
> > -                                   len, 0);
> > +           uint64_t page_sz = msl->page_sz;
> > +
> > +           /* Maintain granularity of DMA map/unmap to memseg size */
> > +           for (; cur_len < len; cur_len += page_sz) {
> > +                   if (type == RTE_MEM_EVENT_ALLOC)
> > +                           vfio_dma_mem_map(default_vfio_cfg, vfio_va,
> > +                                            vfio_va, page_sz, 1);
> > +                   else
> > +                           vfio_dma_mem_map(default_vfio_cfg, vfio_va,
> > +                                            vfio_va, page_sz, 0);
> > +                   vfio_va += page_sz;
> > +           }
> > +
> 
> You'd also have to revert d1c7c0cdf7bac5eb40d3a2a690453aefeee5887b because
> currently the PA path will opportunistically concantenate contiguous
> segments into single mapping too.

Ack, I'll change it even for IOVA as PA mode. I missed that.
> 
> >             return;
> >     }
> > @@ -1383,6 +1393,12 @@ vfio_type1_dma_mem_map(int vfio_container_fd, 
> > uint64_t vaddr, uint64_t iova,
> >                     RTE_LOG(ERR, EAL, "  cannot clear DMA remapping, error 
> > %i (%s)\n",
> >                                     errno, strerror(errno));
> >                     return -1;
> > +           } else if (dma_unmap.size != len) {
> > +                   RTE_LOG(ERR, EAL, "  unexpected size %"PRIu64" of DMA "
> > +                           "remapping cleared instead of %"PRIu64"\n",
> > +                           (uint64_t)dma_unmap.size, len);
> > +                   rte_errno = EIO;
> > +                   return -1;
> >             }
> >     }
> > @@ -1853,6 +1869,12 @@ container_dma_unmap(struct vfio_config *vfio_cfg, 
> > uint64_t vaddr, uint64_t iova,
> >             /* we're partially unmapping a previously mapped region, so we
> >              * need to split entry into two.
> >              */
> > +           if (!vfio_cfg->vfio_iommu_type->partial_unmap) {
> > +                   RTE_LOG(DEBUG, EAL, "DMA partial unmap unsupported\n");
> > +                   rte_errno = ENOTSUP;
> > +                   ret = -1;
> > +                   goto out;
> > +           }
> 
> How would we ever arrive here if we never do more than 1 page worth of
> memory anyway? I don't think this is needed.

container_dma_unmap() is called by user via rte_vfio_container_dma_unmap() 
and when he maps we don't split it as we don't about his memory.
So if he maps multiple pages and tries to unmap partially, then we should fail.

> 
> >             if (user_mem_maps->n_maps == VFIO_MAX_USER_MEM_MAPS) {
> >                     RTE_LOG(ERR, EAL, "Not enough space to store partial 
> > mapping\n");
> >                     rte_errno = ENOMEM;
> > diff --git a/lib/librte_eal/linux/eal_vfio.h 
> > b/lib/librte_eal/linux/eal_vfio.h
> > index cb2d35f..6ebaca6 100644
> > --- a/lib/librte_eal/linux/eal_vfio.h
> > +++ b/lib/librte_eal/linux/eal_vfio.h
> > @@ -113,6 +113,7 @@ typedef int (*vfio_dma_user_func_t)(int fd, uint64_t 
> > vaddr, uint64_t iova,
> >   struct vfio_iommu_type {
> >     int type_id;
> >     const char *name;
> > +   bool partial_unmap;
> >     vfio_dma_user_func_t dma_user_map_func;
> >     vfio_dma_func_t dma_map_func;
> >   };
> > 
> 
> 
> -- 
> Thanks,
> Anatoly

Reply via email to