O via copy_file_range

Christian König Tue, 10 Jun 2025 04:47:06 -0700

On 6/9/25 11:32, wangtao wrote:
>> -----Original Message-----
>> From: Christoph Hellwig <[email protected]>
>> Sent: Monday, June 9, 2025 12:35 PM
>> To: Christian König <[email protected]>
>> Cc: wangtao <[email protected]>; Christoph Hellwig
>> <[email protected]>; [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected]; dri-
>> [email protected]; [email protected]; linux-
>> [email protected]; [email protected]; linux-
>> [email protected]; wangbintian(BintianWang) <[email protected]>;
>> yipengxiang <[email protected]>; liulu 00013167
>> <[email protected]>; hanfeng 00012985 <[email protected]>
>> Subject: Re: [PATCH v4 0/4] Implement dmabuf direct I/O via
>> copy_file_range
>>
>> On Fri, Jun 06, 2025 at 01:20:48PM +0200, Christian König wrote:
>>>> dmabuf acts as a driver and shouldn't be handled by VFS, so I made
>>>> dmabuf implement copy_file_range callbacks to support direct I/O
>>>> zero-copy. I'm open to both approaches. What's the preference of VFS
>>>> experts?
>>>
>>> That would probably be illegal. Using the sg_table in the DMA-buf
>>> implementation turned out to be a mistake.
>>
>> Two thing here that should not be directly conflated.  Using the sg_table was
>> a huge mistake, and we should try to move dmabuf to switch that to a pure
> I'm a bit confused: don't dmabuf importers need to traverse sg_table to
> access folios or dma_addr/len? Do you mean restricting sg_table access
> (e.g., only via iov_iter) or proposing alternative approaches?


No, accessing pages folios inside the sg_table of a DMA-buf is strictly 
forbidden.

We have removed most use cases of that over the years and push back on 
generating new ones. 

> 
>> dma_addr_t/len array now that the new DMA API supporting that has been
>> merged.  Is there any chance the dma-buf maintainers could start to kick this
>> off?  I'm of course happy to assist.

Work on that is already underway for some time.

Most GPU drivers already do sg_table -> DMA array conversion, I need to push on 
the remaining to clean up.

But there are also tons of other users of dma_buf_map_attachment() which needs 
to be converted.

>> But that notwithstanding, dma-buf is THE buffer sharing mechanism in the
>> kernel, and we should promote it instead of reinventing it badly.
>> And there is a use case for having a fully DMA mapped buffer in the block
>> layer and I/O path, especially on systems with an IOMMU.
>> So having an iov_iter backed by a dma-buf would be extremely helpful.
>> That's mostly lib/iov_iter.c code, not VFS, though.
> Are you suggesting adding an ITER_DMABUF type to iov_iter, or
> implementing dmabuf-to-iov_bvec conversion within iov_iter?

That would be rather nice to have, yeah.

> 
>>
>>> The question Christoph raised was rather why is your CPU so slow that
>>> walking the page tables has a significant overhead compared to the
>>> actual I/O?
>>
>> Yes, that's really puzzling and should be addressed first.
> With high CPU performance (e.g., 3GHz), GUP (get_user_pages) overhead
> is relatively low (observed in 3GHz tests).

Even on a low end CPU walking the page tables and grabbing references shouldn't 
be that much of an overhead.

There must be some reason why you see so much CPU overhead. E.g. compound pages 
are broken up or similar which should not happen in the first place.

Regards,
Christian.


> |    32x32MB Read 1024MB    |Creat-ms|Close-ms|  I/O-ms|I/O-MB/s| I/O%
> |---------------------------|--------|--------|--------|--------|-----
> | 1)        memfd direct R/W|      1 |    118 |    312 |   3448 | 100%
> | 2)      u+memfd direct R/W|    196 |    123 |    295 |   3651 | 105%
> | 3) u+memfd direct sendfile|    175 |    102 |    976 |   1100 |  31%
> | 4)   u+memfd direct splice|    173 |    103 |    443 |   2428 |  70%
> | 5)      udmabuf buffer R/W|    183 |    100 |    453 |   2375 |  68%
> | 6)       dmabuf buffer R/W|     34 |      4 |    427 |   2519 |  73%
> | 7)    udmabuf direct c_f_r|    200 |    102 |    278 |   3874 | 112%
> | 8)     dmabuf direct c_f_r|     36 |      5 |    269 |   4002 | 116%
> 
> With lower CPU performance (e.g., 1GHz), GUP overhead becomes more
> significant (as seen in 1GHz tests).
> |    32x32MB Read 1024MB    |Creat-ms|Close-ms|  I/O-ms|I/O-MB/s| I/O%
> |---------------------------|--------|--------|--------|--------|-----
> | 1)        memfd direct R/W|      2 |    393 |    969 |   1109 | 100%
> | 2)      u+memfd direct R/W|    592 |    424 |    570 |   1884 | 169%
> | 3) u+memfd direct sendfile|    587 |    356 |   2229 |    481 |  43%
> | 4)   u+memfd direct splice|    568 |    352 |    795 |   1350 | 121%
> | 5)      udmabuf buffer R/W|    597 |    343 |   1238 |    867 |  78%
> | 6)       dmabuf buffer R/W|     69 |     13 |   1128 |    952 |  85%
> | 7)    udmabuf direct c_f_r|    595 |    345 |    372 |   2889 | 260%
> | 8)     dmabuf direct c_f_r|     80 |     13 |    274 |   3929 | 354%
> 
> Regards,
> Wangtao.

Re: [PATCH v4 0/4] Implement dmabuf direct I/O via copy_file_range

Reply via email to