On Tue, Dec 08, 2020 at 10:49:16PM -0500, Sven Van Asbroeck wrote: > On Tue, Dec 8, 2020 at 6:36 PM Florian Fainelli <f.faine...@gmail.com> wrote: > > > > dma_sync_single_for_{cpu,device} is what you would need in order to make > > a partial cache line invalidation. You would still need to unmap the > > same address+length pair that was used for the initial mapping otherwise > > the DMA-API debugging will rightfully complain. > > I tried replacing > dma_unmap_single(9K, DMA_FROM_DEVICE); > with > dma_sync_single_for_cpu(received_size=1500 bytes, DMA_FROM_DEVICE); > dma_unmap_single_attrs(9K, DMA_FROM_DEVICE, DMA_ATTR_SKIP_CPU_SYNC); > > and that works! But the bandwidth is still pretty bad, because the cpu > now spends most of its time doing > dma_map_single(9K, DMA_FROM_DEVICE); > which spends a lot of time doing __dma_page_cpu_to_dev.
9K is not a nice number, since for each allocation it probably has to find 4 contiguous pages. See what the performance difference is with 2K, 4K and 8K. If there is a big difference, you might want to special case when the MTU is set for jumbo packets, or check if the hardware can do scatter/gather. You also need to be careful with caches and speculation. As you have seen, bad things can happen. And it can be a lot more subtle. If some code is accessing the page before the buffer and gets towards the end of the page, the CPU might speculatively bring in the next page, i.e the start of the buffer. If that happens before the DMA operation, and you don't invalidate the cache correctly, you get hard to find corruption. Andrew