Hi Jesper, > On Thu, 4 Jul 2019 10:13:37 +0000 > Jose Abreu <jose.ab...@synopsys.com> wrote: > > > The page_pool DMA mapping cannot be "kept" when page traveling into the > > > network stack attached to an SKB. (Ilias and I have a long term plan[1] > > > to allow this, but you cannot do it ATM). > > > > The reason I recycle the page is this previous call to: > > > > skb_copy_to_linear_data() > > > > So, technically, I'm syncing to CPU the page(s) and then memcpy to a > > previously allocated SKB ... So it's safe to just recycle the mapping I > > think. > > I didn't notice the skb_copy_to_linear_data(), will copy the entire > frame, thus leaving the page unused and avail for recycle.
Yea this is essentially a 'copybreak' without the byte limitation that other drivers usually impose (remember mvneta was doing this for all packets < 256b) That's why i was concerned on what will happen on > 1000b frames and what the memory pressure is going to be. The trade off here is copying vs mapping/unmapping. > > Then it looks like you are doing the correct thing. I will appreciate > if you could add a comment above the call like: > > /* Data payload copied into SKB, page ready for recycle */ > page_pool_recycle_direct(rx_q->page_pool, buf->page); > > > > Its kind of using bounce buffers and I do see performance gain in this > > (I think the reason is because my setup uses swiotlb for DMA mapping). > > > > Anyway, I'm open to some suggestions on how to improve this ... > > I was surprised to see page_pool being used outside the surrounding XDP > APIs (included/net/xdp.h). For you use-case, where you "just" use > page_pool as a driver-local fast recycle-allocator for RX-ring that > keeps pages DMA mapped, it does make a lot of sense. It simplifies the > driver a fair amount: > > 3 files changed, 63 insertions(+), 144 deletions(-) > > Thanks for demonstrating a use-case for page_pool besides XDP, and for > simplifying a driver with this. Same here thanks Jose, > > > > > Also remember that the page_pool requires you driver to do the > > > DMA-sync operation. I see a dma_sync_single_for_cpu(), but I > > > didn't see a dma_sync_single_for_device() (well, I noticed one > > > getting removed). (For some HW Ilias tells me that the > > > dma_sync_single_for_device can be elided, so maybe this can still > > > be correct for you). > > > > My HW just needs descriptors refilled which are in different coherent > > region so I don't see any reason for dma_sync_single_for_device() ... > > For you use-case, given you are copying out the data, and not writing > into it, then I don't think you need to do sync for device (before > giving the device the page again for another RX-ring cycle). > > The way I understand the danger: if writing to the DMA memory region, > and not doing the DMA-sync for-device, then the HW/coherency-system can > write-back the memory later. Which creates a race with the DMA-device, > if it is receiving a packet and is doing a write into same DMA memory > region. Someone correct me if I misunderstood this... Similar understanding here Cheers /Ilias