On 6/8/26 15:55, Bobby Eshleman wrote:
> 
> On Sun, Jun 7, 2026 at 11:42 PM Christian König <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>     On 6/5/26 20:44, Bobby Eshleman wrote:
>     > On Fri, Jun 05, 2026 at 11:30:07AM +0200, Christian König wrote:
>     >> On 6/4/26 02:42, Bobby Eshleman wrote:
>     >>> From: Bobby Eshleman <[email protected] 
> <mailto:[email protected]>>
>     >>>
>     >>> get_sg_table() emitted one PAGE_SIZE sg entry per page even when the
>     >>> underlying folio was larger.
>     >>>
>     >>> Instead, walk folios[] and emit one sg entry per folio. When folios
>     >>> represent large pages (as is for MFD_HUGETLB), each sg entry is a 
> large
>     >>> page. Normal PAGE_SIZE sg tables are unchanged.
>     >>>
>     >>> Required by net/core/devmem to support rx-buf-size > PAGE_SIZE with
>     >>> udmabuf.
>     >>
>     >> That doesn't explain why this is required.
>     >
>     > Sure, can definitely add. Devmem currently requires dmabuf sg entries to
>     > be length and size aligned when it allocates niovs for NIC page pools.
>     > Though udmabuf is not violating any dmabuf contract by emitting
>     > PAGE_SIZE entries and the above restriction is probably more a
>     > shortfalling of devmem, by emitting a single entry per folio this patch
>     > allows udmabuf to be used by devmem for large pages.
>     >
>     >>
>     >> Please note that accessing the pages/folio of an sg-table returned by 
> DMA-buf is illegal and strictly forbidden!
>     >>
>     >> Regards,
>     >> Christian.
>     >
>     > It seems both devmem and io_uring zcrx at least introspect through to
>     > the sg-table to build NIC page pools (not accessing the memory itself,
>     > however). Is there a better way?
> 
>     That's an absolute NO-GO! We need to stop that immediately.
> 
>     Touching the underlying struct page of an DMA-buf exported sg-table is 
> strictly forbidden.
> 
>     We even have code to wrap the sg_table and hide the struct pages on debug 
> builds to catch those issues, see function dma_buf_wrap_sg_table().
> 
>     My last status is that the NIC page pools are build directly from the DMA 
> addresses exposed by the sg_table.
> 
>     Was there any change I'm not aware of?
> 
>     Regards,
>     Christian.
> 
> 
> Oh no change, your mental model is still current.
> They just go through each sg and use sg_dma_address() on each.

Ah, thanks! That was a near heart attack :D

Yeah that is perfectly correct, question is do you then still really need this 
udmabuf change? I mean the DMA API usually merges together contiguous DMA 
addresses.

Regards,
Christian.

> 
> Best,
> Bobby
> 
> 
>     >
>     > Best,
>     > Bobby
>     >
>     >>
>     >>> Signed-off-by: Bobby Eshleman <[email protected] 
> <mailto:[email protected]>>
>     >>> ---
>     >>>  drivers/dma-buf/udmabuf.c | 47 
> ++++++++++++++++++++++++++++++++++++++++++-----
>     >>>  1 file changed, 42 insertions(+), 5 deletions(-)
>     >>>
>     >>> diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
>     >>> index 94b8ecb892bb..f28dd3788ada 100644
>     >>> --- a/drivers/dma-buf/udmabuf.c
>     >>> +++ b/drivers/dma-buf/udmabuf.c
>     >>> @@ -141,26 +141,63 @@ static void vunmap_udmabuf(struct dma_buf *buf, 
> struct iosys_map *map)
>     >>>         vm_unmap_ram(map->vaddr, ubuf->pagecount);
>     >>>  }
>     >>>
>     >>> +/* Return the number of contiguous pages backed by the folio at @i.
>     >>> + * A udmabuf may map only part of a folio, or reference the same 
> folio
>     >>> + * in multiple non-contiguous runs, so folio_nr_pages() can't be 
> used.
>     >>> + */
>     >>> +static pgoff_t udmabuf_folio_nr_pages(struct udmabuf *ubuf, pgoff_t 
> i)
>     >>> +{
>     >>> +       struct folio *f = ubuf->folios[i];
>     >>> +       pgoff_t j;
>     >>> +
>     >>> +       for (j = 1; i + j < ubuf->pagecount; j++) {
>     >>> +               if (ubuf->folios[i + j] != f)
>     >>> +                       break;
>     >>> +               /* Same folio, but not a sequential offset within it. 
> */
>     >>> +               if (ubuf->offsets[i + j] != ubuf->offsets[i] + j * 
> PAGE_SIZE)
>     >>> +                       break;
>     >>> +       }
>     >>> +       return j;
>     >>> +}
>     >>> +
>     >>> +/* Count the contiguous folio runs in @ubuf, one sg entry per run. */
>     >>> +static unsigned int udmabuf_sg_nents(struct udmabuf *ubuf)
>     >>> +{
>     >>> +       unsigned int nents = 0;
>     >>> +       pgoff_t i;
>     >>> +
>     >>> +       for (i = 0; i < ubuf->pagecount; i += 
> udmabuf_folio_nr_pages(ubuf, i))
>     >>> +               nents++;
>     >>> +       return nents;
>     >>> +}
>     >>> +
>     >>>  static struct sg_table *get_sg_table(struct device *dev, struct 
> dma_buf *buf,
>     >>>                                      enum dma_data_direction 
> direction)
>     >>>  {
>     >>>         struct udmabuf *ubuf = buf->priv;
>     >>> -       struct sg_table *sg;
>     >>>         struct scatterlist *sgl;
>     >>> -       unsigned int i = 0;
>     >>> +       struct sg_table *sg;
>     >>> +       pgoff_t i, run;
>     >>> +       unsigned int nents;
>     >>>         int ret;
>     >>>
>     >>> +       nents = udmabuf_sg_nents(ubuf);
>     >>> +
>     >>>         sg = kzalloc_obj(*sg);
>     >>>         if (!sg)
>     >>>                 return ERR_PTR(-ENOMEM);
>     >>>
>     >>> -       ret = sg_alloc_table(sg, ubuf->pagecount, GFP_KERNEL);
>     >>> +       ret = sg_alloc_table(sg, nents, GFP_KERNEL);
>     >>>         if (ret < 0)
>     >>>                 goto err_alloc;
>     >>>
>     >>> -       for_each_sg(sg->sgl, sgl, ubuf->pagecount, i)
>     >>> -               sg_set_folio(sgl, ubuf->folios[i], PAGE_SIZE,
>     >>> +       sgl = sg->sgl;
>     >>> +       for (i = 0; i < ubuf->pagecount; i += run) {
>     >>> +               run = udmabuf_folio_nr_pages(ubuf, i);
>     >>> +               sg_set_folio(sgl, ubuf->folios[i], run << PAGE_SHIFT,
>     >>>                              ubuf->offsets[i]);
>     >>> +               sgl = sg_next(sgl);
>     >>> +       }
>     >>>
>     >>>         ret = dma_map_sgtable(dev, sg, direction, 0);
>     >>>         if (ret < 0)
>     >>>
>     >>> --
>     >>> 2.53.0-Meta
>     >>>
>     >>
> 


Reply via email to