> -----Original Message-----
> From: Jason Gunthorpe <j...@ziepe.ca>
> Sent: Wednesday, November 04, 2020 4:07 PM
> To: Xiong, Jianxin <jianxin.xi...@intel.com>
> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford 
> <dledf...@redhat.com>; Leon Romanovsky
> <l...@kernel.org>; Sumit Semwal <sumit.sem...@linaro.org>; Christian Koenig 
> <christian.koe...@amd.com>; Vetter, Daniel
> <daniel.vet...@intel.com>
> Subject: Re: [PATCH v7 4/5] RDMA/mlx5: Support dma-buf based userspace memory 
> region
> 
> On Wed, Nov 04, 2020 at 02:06:34PM -0800, Jianxin Xiong wrote:
> > +   umem = ib_umem_dmabuf_get(&dev->ib_dev, offset, length, fd, 
> > access_flags,
> > +                             &mlx5_ib_dmabuf_attach_ops);
> > +   if (IS_ERR(umem)) {
> > +           mlx5_ib_dbg(dev, "umem get failed (%ld)\n", PTR_ERR(umem));
> > +           return ERR_PTR(PTR_ERR(umem));
> > +   }
> > +
> > +   mr = alloc_mr_from_cache(pd, umem, virt_addr, access_flags);
> 
> It is very subtle, but this calls mlx5_umem_find_best_pgsz() which calls 
> ib_umem_find_best_pgsz() which goes over the SGL to determine
> the page size to use.
> 

When this is called here, the umem sglist is still NULL because 
dma_buf_map_attachment()
is not called until a page fault occurs. In patch 1/5, the function 
ib_umem_find_best_pgsz()
has been modified to always return PAGE_SIZE for dma-buf based MR.

> As part of this it does validation of the IOVA vs first page offset vs first 
> page dma address. These little details come into play if the IOVA and
> offset are not PAGE_SIZE aligned, which is very possible if the dma buf 
> exporter or system PAGE_SIZE is over 4k.
> 
> In other words, the dma_address of the first SGL must be the page aligned 
> starting point of the MR. Since the 'skip' approach is being done
> when breaking the SGL into blocks the ib_umem_find_best_pgsz() sees an 
> invalid page size.
> 
> Slicing it has to be done in a way that gives a properly formed SGL.
> 
> My suggestion is to just change the SGL in place. Iterate to the starting SGE 
> in the SGL and assign it to the sg table, modify it to have a offset
> dma_address and reduced length
> 
> Count the number of SGEs to span the remaning range and use that as the new 
> nmaped
> 
> Edit the last SGE to have a reduced length

Do you still think modifying the SGL in place needed given the above 
explanation? I do see
some benefits of doing so -- hiding the discrepancy of sgl and addr/length from 
the device drivers and avoid special handling in the code that use the sgl. 

> 
> Upon unmap undo the edits so the exporter doesn't see the mangled SGL.
> 
> It would be saner if the exporter could do this, but oh well.
> 
> Approximately like this:
> 
>       struct ib_umem *umem = &umem_p->umem;
>       struct scatterlist *sg;
>       int i;
> 
>       for_each_sg(umem_p->umem.sg_head.sgl, sg, umem_p->umem.nmap, i) {
>               if (cur + sg_dma_len(sg) > ALIGN_DOWN(umem->address, 
> PAGE_SIZE)) {
>                       unsigned long offset;
> 
>                       umem_p->first_sg = sg;
>                       umem_p->first_dma_address = sg->dma_address;
>                       umem_p->first_dma_length = sg_dma_len(sg);
>                       umem_p->first_length = sg->length;
>                       offset = ALIGN_DOWN(umem->addressm PAGE_SIZE) - cur;
>                       sg->dma_address += offset;
>                       sg_dma_len(sg) -= offset;
>                       sg->length -= offset;
>               }
>               if (ALIGN(umem->address + umem->length, PAGE_SIZE) < cur + 
> sg_dma_len(sg)) {
>                       unsigned long trim;
> 
>                       umem_p->last_sg = sg;
>                       umem_p->last_dma_length = sg_dma_len(sg);
>                       umem_p->last_length = sg->length;
>                       trim =  cur + sg_dma_len(sg) - ALIGN(umem->address + 
> umem->length, PAGE_SIZE);
>                       sg_dma_len(sg) -= trim;
>                       sg->length -= trim;
>                       return npages;
>               }
>                 cur += sg_dma_len(sg);
>       }
>         /* It is essential that the length of the SGL exactly match
>          the adjusted page aligned length of umem->length */
>       return -EINVAL;
> 
> Further, this really only works if the umem->page_size is locked to 4k 
> because this doesn't have code to resize the MKEY, or change the
> underlying page size when the SGL changes.

Yes, now it's locked to 4K. 

> 
> So, I'd say put something like the above in the core code to validate and 
> properly form the umem->sgl
> 
> Then modify the alloc_mr_from_cache to use only PAGE_SIZE:
> 
>  if (umem->is_dma_buf)
>         page_size = ib_umem_find_best_pgsz(umem, PAGE_SIZE, iova);  else
>       page_size = mlx5_umem_find_best_pgsz(umem, mkc, log_page_size, 0, iova);
> 

This should have been addressed in patch 1/5.

Thanks,
Jianxin

> Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Reply via email to