On 6/5/26 20:44, Bobby Eshleman wrote:
> On Fri, Jun 05, 2026 at 11:30:07AM +0200, Christian König wrote:
>> On 6/4/26 02:42, Bobby Eshleman wrote:
>>> From: Bobby Eshleman <[email protected]>
>>>
>>> get_sg_table() emitted one PAGE_SIZE sg entry per page even when the
>>> underlying folio was larger.
>>>
>>> Instead, walk folios[] and emit one sg entry per folio. When folios
>>> represent large pages (as is for MFD_HUGETLB), each sg entry is a large
>>> page. Normal PAGE_SIZE sg tables are unchanged.
>>>
>>> Required by net/core/devmem to support rx-buf-size > PAGE_SIZE with
>>> udmabuf.
>>
>> That doesn't explain why this is required.
> 
> Sure, can definitely add. Devmem currently requires dmabuf sg entries to
> be length and size aligned when it allocates niovs for NIC page pools.
> Though udmabuf is not violating any dmabuf contract by emitting
> PAGE_SIZE entries and the above restriction is probably more a
> shortfalling of devmem, by emitting a single entry per folio this patch
> allows udmabuf to be used by devmem for large pages.
> 
>>
>> Please note that accessing the pages/folio of an sg-table returned by 
>> DMA-buf is illegal and strictly forbidden!
>>
>> Regards,
>> Christian.
> 
> It seems both devmem and io_uring zcrx at least introspect through to
> the sg-table to build NIC page pools (not accessing the memory itself,
> however). Is there a better way?

That's an absolute NO-GO! We need to stop that immediately.

Touching the underlying struct page of an DMA-buf exported sg-table is strictly 
forbidden.

We even have code to wrap the sg_table and hide the struct pages on debug 
builds to catch those issues, see function dma_buf_wrap_sg_table().

My last status is that the NIC page pools are build directly from the DMA 
addresses exposed by the sg_table.

Was there any change I'm not aware of?

Regards,
Christian.

> 
> Best,
> Bobby
> 
>>
>>> Signed-off-by: Bobby Eshleman <[email protected]>
>>> ---
>>>  drivers/dma-buf/udmabuf.c | 47 
>>> ++++++++++++++++++++++++++++++++++++++++++-----
>>>  1 file changed, 42 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
>>> index 94b8ecb892bb..f28dd3788ada 100644
>>> --- a/drivers/dma-buf/udmabuf.c
>>> +++ b/drivers/dma-buf/udmabuf.c
>>> @@ -141,26 +141,63 @@ static void vunmap_udmabuf(struct dma_buf *buf, 
>>> struct iosys_map *map)
>>>         vm_unmap_ram(map->vaddr, ubuf->pagecount);
>>>  }
>>>
>>> +/* Return the number of contiguous pages backed by the folio at @i.
>>> + * A udmabuf may map only part of a folio, or reference the same folio
>>> + * in multiple non-contiguous runs, so folio_nr_pages() can't be used.
>>> + */
>>> +static pgoff_t udmabuf_folio_nr_pages(struct udmabuf *ubuf, pgoff_t i)
>>> +{
>>> +       struct folio *f = ubuf->folios[i];
>>> +       pgoff_t j;
>>> +
>>> +       for (j = 1; i + j < ubuf->pagecount; j++) {
>>> +               if (ubuf->folios[i + j] != f)
>>> +                       break;
>>> +               /* Same folio, but not a sequential offset within it. */
>>> +               if (ubuf->offsets[i + j] != ubuf->offsets[i] + j * 
>>> PAGE_SIZE)
>>> +                       break;
>>> +       }
>>> +       return j;
>>> +}
>>> +
>>> +/* Count the contiguous folio runs in @ubuf, one sg entry per run. */
>>> +static unsigned int udmabuf_sg_nents(struct udmabuf *ubuf)
>>> +{
>>> +       unsigned int nents = 0;
>>> +       pgoff_t i;
>>> +
>>> +       for (i = 0; i < ubuf->pagecount; i += udmabuf_folio_nr_pages(ubuf, 
>>> i))
>>> +               nents++;
>>> +       return nents;
>>> +}
>>> +
>>>  static struct sg_table *get_sg_table(struct device *dev, struct dma_buf 
>>> *buf,
>>>                                      enum dma_data_direction direction)
>>>  {
>>>         struct udmabuf *ubuf = buf->priv;
>>> -       struct sg_table *sg;
>>>         struct scatterlist *sgl;
>>> -       unsigned int i = 0;
>>> +       struct sg_table *sg;
>>> +       pgoff_t i, run;
>>> +       unsigned int nents;
>>>         int ret;
>>>
>>> +       nents = udmabuf_sg_nents(ubuf);
>>> +
>>>         sg = kzalloc_obj(*sg);
>>>         if (!sg)
>>>                 return ERR_PTR(-ENOMEM);
>>>
>>> -       ret = sg_alloc_table(sg, ubuf->pagecount, GFP_KERNEL);
>>> +       ret = sg_alloc_table(sg, nents, GFP_KERNEL);
>>>         if (ret < 0)
>>>                 goto err_alloc;
>>>
>>> -       for_each_sg(sg->sgl, sgl, ubuf->pagecount, i)
>>> -               sg_set_folio(sgl, ubuf->folios[i], PAGE_SIZE,
>>> +       sgl = sg->sgl;
>>> +       for (i = 0; i < ubuf->pagecount; i += run) {
>>> +               run = udmabuf_folio_nr_pages(ubuf, i);
>>> +               sg_set_folio(sgl, ubuf->folios[i], run << PAGE_SHIFT,
>>>                              ubuf->offsets[i]);
>>> +               sgl = sg_next(sgl);
>>> +       }
>>>
>>>         ret = dma_map_sgtable(dev, sg, direction, 0);
>>>         if (ret < 0)
>>>
>>> --
>>> 2.53.0-Meta
>>>
>>


Reply via email to