On 6/8/26 15:55, Bobby Eshleman wrote: > > On Sun, Jun 7, 2026 at 11:42 PM Christian König <[email protected] > <mailto:[email protected]>> wrote: > > On 6/5/26 20:44, Bobby Eshleman wrote: > > On Fri, Jun 05, 2026 at 11:30:07AM +0200, Christian König wrote: > >> On 6/4/26 02:42, Bobby Eshleman wrote: > >>> From: Bobby Eshleman <[email protected] > <mailto:[email protected]>> > >>> > >>> get_sg_table() emitted one PAGE_SIZE sg entry per page even when the > >>> underlying folio was larger. > >>> > >>> Instead, walk folios[] and emit one sg entry per folio. When folios > >>> represent large pages (as is for MFD_HUGETLB), each sg entry is a > large > >>> page. Normal PAGE_SIZE sg tables are unchanged. > >>> > >>> Required by net/core/devmem to support rx-buf-size > PAGE_SIZE with > >>> udmabuf. > >> > >> That doesn't explain why this is required. > > > > Sure, can definitely add. Devmem currently requires dmabuf sg entries to > > be length and size aligned when it allocates niovs for NIC page pools. > > Though udmabuf is not violating any dmabuf contract by emitting > > PAGE_SIZE entries and the above restriction is probably more a > > shortfalling of devmem, by emitting a single entry per folio this patch > > allows udmabuf to be used by devmem for large pages. > > > >> > >> Please note that accessing the pages/folio of an sg-table returned by > DMA-buf is illegal and strictly forbidden! > >> > >> Regards, > >> Christian. > > > > It seems both devmem and io_uring zcrx at least introspect through to > > the sg-table to build NIC page pools (not accessing the memory itself, > > however). Is there a better way? > > That's an absolute NO-GO! We need to stop that immediately. > > Touching the underlying struct page of an DMA-buf exported sg-table is > strictly forbidden. > > We even have code to wrap the sg_table and hide the struct pages on debug > builds to catch those issues, see function dma_buf_wrap_sg_table(). > > My last status is that the NIC page pools are build directly from the DMA > addresses exposed by the sg_table. > > Was there any change I'm not aware of? > > Regards, > Christian. > > > Oh no change, your mental model is still current. > They just go through each sg and use sg_dma_address() on each.
Ah, thanks! That was a near heart attack :D Yeah that is perfectly correct, question is do you then still really need this udmabuf change? I mean the DMA API usually merges together contiguous DMA addresses. Regards, Christian. > > Best, > Bobby > > > > > > Best, > > Bobby > > > >> > >>> Signed-off-by: Bobby Eshleman <[email protected] > <mailto:[email protected]>> > >>> --- > >>> drivers/dma-buf/udmabuf.c | 47 > ++++++++++++++++++++++++++++++++++++++++++----- > >>> 1 file changed, 42 insertions(+), 5 deletions(-) > >>> > >>> diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c > >>> index 94b8ecb892bb..f28dd3788ada 100644 > >>> --- a/drivers/dma-buf/udmabuf.c > >>> +++ b/drivers/dma-buf/udmabuf.c > >>> @@ -141,26 +141,63 @@ static void vunmap_udmabuf(struct dma_buf *buf, > struct iosys_map *map) > >>> vm_unmap_ram(map->vaddr, ubuf->pagecount); > >>> } > >>> > >>> +/* Return the number of contiguous pages backed by the folio at @i. > >>> + * A udmabuf may map only part of a folio, or reference the same > folio > >>> + * in multiple non-contiguous runs, so folio_nr_pages() can't be > used. > >>> + */ > >>> +static pgoff_t udmabuf_folio_nr_pages(struct udmabuf *ubuf, pgoff_t > i) > >>> +{ > >>> + struct folio *f = ubuf->folios[i]; > >>> + pgoff_t j; > >>> + > >>> + for (j = 1; i + j < ubuf->pagecount; j++) { > >>> + if (ubuf->folios[i + j] != f) > >>> + break; > >>> + /* Same folio, but not a sequential offset within it. > */ > >>> + if (ubuf->offsets[i + j] != ubuf->offsets[i] + j * > PAGE_SIZE) > >>> + break; > >>> + } > >>> + return j; > >>> +} > >>> + > >>> +/* Count the contiguous folio runs in @ubuf, one sg entry per run. */ > >>> +static unsigned int udmabuf_sg_nents(struct udmabuf *ubuf) > >>> +{ > >>> + unsigned int nents = 0; > >>> + pgoff_t i; > >>> + > >>> + for (i = 0; i < ubuf->pagecount; i += > udmabuf_folio_nr_pages(ubuf, i)) > >>> + nents++; > >>> + return nents; > >>> +} > >>> + > >>> static struct sg_table *get_sg_table(struct device *dev, struct > dma_buf *buf, > >>> enum dma_data_direction > direction) > >>> { > >>> struct udmabuf *ubuf = buf->priv; > >>> - struct sg_table *sg; > >>> struct scatterlist *sgl; > >>> - unsigned int i = 0; > >>> + struct sg_table *sg; > >>> + pgoff_t i, run; > >>> + unsigned int nents; > >>> int ret; > >>> > >>> + nents = udmabuf_sg_nents(ubuf); > >>> + > >>> sg = kzalloc_obj(*sg); > >>> if (!sg) > >>> return ERR_PTR(-ENOMEM); > >>> > >>> - ret = sg_alloc_table(sg, ubuf->pagecount, GFP_KERNEL); > >>> + ret = sg_alloc_table(sg, nents, GFP_KERNEL); > >>> if (ret < 0) > >>> goto err_alloc; > >>> > >>> - for_each_sg(sg->sgl, sgl, ubuf->pagecount, i) > >>> - sg_set_folio(sgl, ubuf->folios[i], PAGE_SIZE, > >>> + sgl = sg->sgl; > >>> + for (i = 0; i < ubuf->pagecount; i += run) { > >>> + run = udmabuf_folio_nr_pages(ubuf, i); > >>> + sg_set_folio(sgl, ubuf->folios[i], run << PAGE_SHIFT, > >>> ubuf->offsets[i]); > >>> + sgl = sg_next(sgl); > >>> + } > >>> > >>> ret = dma_map_sgtable(dev, sg, direction, 0); > >>> if (ret < 0) > >>> > >>> -- > >>> 2.53.0-Meta > >>> > >> >

