On Thu, Apr 18, 2024 at 12:15 PM Jason Wang <jasow...@redhat.com> wrote: > > On Wed, Apr 17, 2024 at 4:45 PM Xuan Zhuo <xuanz...@linux.alibaba.com> wrote: > > > > On Wed, 17 Apr 2024 12:08:10 +0800, Jason Wang <jasow...@redhat.com> wrote: > > > On Wed, Apr 17, 2024 at 9:38 AM Xuan Zhuo <xuanz...@linux.alibaba.com> > > > wrote: > > > > > > > > On Tue, 16 Apr 2024 11:24:53 +0800, Jason Wang <jasow...@redhat.com> > > > > wrote: > > > > > On Mon, Apr 15, 2024 at 5:04 PM Xuan Zhuo > > > > > <xuanz...@linux.alibaba.com> wrote: > > > > > > > > > > > > On Mon, 15 Apr 2024 16:56:45 +0800, Jason Wang > > > > > > <jasow...@redhat.com> wrote: > > > > > > > On Mon, Apr 15, 2024 at 4:50 PM Xuan Zhuo > > > > > > > <xuanz...@linux.alibaba.com> wrote: > > > > > > > > > > > > > > > > On Mon, 15 Apr 2024 14:43:24 +0800, Jason Wang > > > > > > > > <jasow...@redhat.com> wrote: > > > > > > > > > On Mon, Apr 15, 2024 at 10:35 AM Xuan Zhuo > > > > > > > > > <xuanz...@linux.alibaba.com> wrote: > > > > > > > > > > > > > > > > > > > > On Fri, 12 Apr 2024 13:49:12 +0800, Jason Wang > > > > > > > > > > <jasow...@redhat.com> wrote: > > > > > > > > > > > On Fri, Apr 12, 2024 at 1:39 PM Xuan Zhuo > > > > > > > > > > > <xuanz...@linux.alibaba.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > > On Fri, 12 Apr 2024 12:47:55 +0800, Jason Wang > > > > > > > > > > > > <jasow...@redhat.com> wrote: > > > > > > > > > > > > > On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo > > > > > > > > > > > > > <xuanz...@linux.alibaba.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > Now, we chain the pages of big mode by the page's > > > > > > > > > > > > > > private variable. > > > > > > > > > > > > > > But a subsequent patch aims to make the big mode to > > > > > > > > > > > > > > support > > > > > > > > > > > > > > premapped mode. This requires additional space to > > > > > > > > > > > > > > store the dma addr. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Within the sub-struct that contains the 'private', > > > > > > > > > > > > > > there is no suitable > > > > > > > > > > > > > > variable for storing the DMA addr. > > > > > > > > > > > > > > > > > > > > > > > > > > > > struct { /* Page cache and > > > > > > > > > > > > > > anonymous pages */ > > > > > > > > > > > > > > /** > > > > > > > > > > > > > > * @lru: Pageout list, eg. > > > > > > > > > > > > > > active_list protected by > > > > > > > > > > > > > > * lruvec->lru_lock. > > > > > > > > > > > > > > Sometimes used as a generic list > > > > > > > > > > > > > > * by the page owner. > > > > > > > > > > > > > > */ > > > > > > > > > > > > > > union { > > > > > > > > > > > > > > struct list_head > > > > > > > > > > > > > > lru; > > > > > > > > > > > > > > > > > > > > > > > > > > > > /* Or, for the > > > > > > > > > > > > > > Unevictable "LRU list" slot */ > > > > > > > > > > > > > > struct { > > > > > > > > > > > > > > /* Always > > > > > > > > > > > > > > even, to negate PageTail */ > > > > > > > > > > > > > > void > > > > > > > > > > > > > > *__filler; > > > > > > > > > > > > > > /* Count > > > > > > > > > > > > > > page's or folio's mlocks */ > > > > > > > > > > > > > > unsigned > > > > > > > > > > > > > > int mlock_count; > > > > > > > > > > > > > > }; > > > > > > > > > > > > > > > > > > > > > > > > > > > > /* Or, free page */ > > > > > > > > > > > > > > struct list_head > > > > > > > > > > > > > > buddy_list; > > > > > > > > > > > > > > struct list_head > > > > > > > > > > > > > > pcp_list; > > > > > > > > > > > > > > }; > > > > > > > > > > > > > > /* See page-flags.h for > > > > > > > > > > > > > > PAGE_MAPPING_FLAGS */ > > > > > > > > > > > > > > struct address_space > > > > > > > > > > > > > > *mapping; > > > > > > > > > > > > > > union { > > > > > > > > > > > > > > pgoff_t index; > > > > > > > > > > > > > > /* Our offset within mapping. */ > > > > > > > > > > > > > > unsigned long > > > > > > > > > > > > > > share; /* share count for fsdax */ > > > > > > > > > > > > > > }; > > > > > > > > > > > > > > /** > > > > > > > > > > > > > > * @private: > > > > > > > > > > > > > > Mapping-private opaque data. > > > > > > > > > > > > > > * Usually used for > > > > > > > > > > > > > > buffer_heads if PagePrivate. > > > > > > > > > > > > > > * Used for swp_entry_t if > > > > > > > > > > > > > > PageSwapCache. > > > > > > > > > > > > > > * Indicates order in the > > > > > > > > > > > > > > buddy system if PageBuddy. > > > > > > > > > > > > > > */ > > > > > > > > > > > > > > unsigned long private; > > > > > > > > > > > > > > }; > > > > > > > > > > > > > > > > > > > > > > > > > > > > But within the page pool struct, we have a variable > > > > > > > > > > > > > > called > > > > > > > > > > > > > > dma_addr that is appropriate for storing dma addr. > > > > > > > > > > > > > > And that struct is used by netstack. That works to > > > > > > > > > > > > > > our advantage. > > > > > > > > > > > > > > > > > > > > > > > > > > > > struct { /* page_pool used > > > > > > > > > > > > > > by netstack */ > > > > > > > > > > > > > > /** > > > > > > > > > > > > > > * @pp_magic: magic value > > > > > > > > > > > > > > to avoid recycling non > > > > > > > > > > > > > > * page_pool allocated > > > > > > > > > > > > > > pages. > > > > > > > > > > > > > > */ > > > > > > > > > > > > > > unsigned long pp_magic; > > > > > > > > > > > > > > struct page_pool *pp; > > > > > > > > > > > > > > unsigned long > > > > > > > > > > > > > > _pp_mapping_pad; > > > > > > > > > > > > > > unsigned long dma_addr; > > > > > > > > > > > > > > atomic_long_t pp_ref_count; > > > > > > > > > > > > > > }; > > > > > > > > > > > > > > > > > > > > > > > > > > > > On the other side, we should use variables from the > > > > > > > > > > > > > > same sub-struct. > > > > > > > > > > > > > > So this patch replaces the "private" with "pp". > > > > > > > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Xuan Zhuo > > > > > > > > > > > > > > <xuanz...@linux.alibaba.com> > > > > > > > > > > > > > > --- > > > > > > > > > > > > > > > > > > > > > > > > > > Instead of doing a customized version of page pool, > > > > > > > > > > > > > can we simply > > > > > > > > > > > > > switch to use page pool for big mode instead? Then we > > > > > > > > > > > > > don't need to > > > > > > > > > > > > > bother the dma stuffs. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The page pool needs to do the dma by the DMA APIs. > > > > > > > > > > > > So we can not use the page pool directly. > > > > > > > > > > > > > > > > > > > > > > I found this: > > > > > > > > > > > > > > > > > > > > > > define PP_FLAG_DMA_MAP BIT(0) /* Should page_pool > > > > > > > > > > > do the DMA > > > > > > > > > > > * map/unmap > > > > > > > > > > > > > > > > > > > > > > It seems to work here? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I have studied the page pool mechanism and believe that we > > > > > > > > > > cannot use it > > > > > > > > > > directly. We can make the page pool to bypass the DMA > > > > > > > > > > operations. > > > > > > > > > > This allows us to handle DMA within virtio-net for pages > > > > > > > > > > allocated from the page > > > > > > > > > > pool. Furthermore, we can utilize page pool helpers to > > > > > > > > > > associate the DMA address > > > > > > > > > > to the page. > > > > > > > > > > > > > > > > > > > > However, the critical issue pertains to unmapping. Ideally, > > > > > > > > > > we want to return > > > > > > > > > > the mapped pages to the page pool and reuse them. In doing > > > > > > > > > > so, we can omit the > > > > > > > > > > unmapping and remapping steps. > > > > > > > > > > > > > > > > > > > > Currently, there's a caveat: when the page pool cache is > > > > > > > > > > full, it disconnects > > > > > > > > > > and releases the pages. When the pool hits its capacity, > > > > > > > > > > pages are relinquished > > > > > > > > > > without a chance for unmapping. > > > > > > > > > > > > > > > > > > Technically, when ptr_ring is full there could be a fallback, > > > > > > > > > but then > > > > > > > > > it requires expensive synchronization between producer and > > > > > > > > > consumer. > > > > > > > > > For virtio-net, it might not be a problem because add/get has > > > > > > > > > been > > > > > > > > > synchronized. (It might be relaxed in the future, actually > > > > > > > > > we've > > > > > > > > > already seen a requirement in the past for virito-blk). > > > > > > > > > > > > > > > > The point is that the page will be released by page pool > > > > > > > > directly, > > > > > > > > we will have no change to unmap that, if we work with page pool. > > > > > > > > > > > > > > I mean if we have a fallback, there would be no need to release > > > > > > > these > > > > > > > pages but put them into a link list. > > > > > > > > > > > > > > > > > > What fallback? > > > > > > > > > > https://lore.kernel.org/netdev/1519607771-20613-1-git-send-email-...@redhat.com/ > > > > > > > > > > > > > > > > > If we put the pages to the link list, why we use the page pool? > > > > > > > > > > The size of the cache and ptr_ring needs to be fixed. > > > > > > > > > > Again, as explained above, it needs more benchmarks and looks like a > > > > > separate topic. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If we were to unmap pages each time before > > > > > > > > > > returning them to the pool, we would negate the benefits of > > > > > > > > > > bypassing the > > > > > > > > > > mapping and unmapping process altogether. > > > > > > > > > > > > > > > > > > Yes, but the problem in this approach is that it creates a > > > > > > > > > corner > > > > > > > > > exception where dma_addr is used outside the page pool. > > > > > > > > > > > > > > > > YES. This is a corner exception. We need to introduce this case > > > > > > > > to the page > > > > > > > > pool. > > > > > > > > > > > > > > > > So for introducing the page-pool to virtio-net(not only for big > > > > > > > > mode), > > > > > > > > we may need to push the page-pool to support dma by drivers. > > > > > > > > > > > > > > Adding Jesper for some comments. > > > > > > > > > > > > > > > > > > > > > > > Back to this patch set, I think we should keep the virtio-net > > > > > > > > to manage > > > > > > > > the pages. > > > > > > > > > > > > > > > > What do you think? > > > > > > > > > > > > > > I might be wrong, but I think if we need to either > > > > > > > > > > > > > > 1) seek a way to manage the pages by yourself but not touching > > > > > > > page > > > > > > > pool metadata (or Jesper is fine with this) > > > > > > > > > > > > Do you mean working with page pool or not? > > > > > > > > > > > > > > > > I meant if Jesper is fine with reusing page pool metadata like this > > > > > patch. > > > > > > > > > > > If we manage the pages by self(no page pool), we do not care the > > > > > > metadata is for > > > > > > page pool or not. We just use the space of pages like the "private". > > > > > > > > > > That's also fine. > > > > > > > > > > > > > > > > > > > > > > > > 2) optimize the unmap for page pool > > > > > > > > > > > > > > or even > > > > > > > > > > > > > > 3) just do dma_unmap before returning the page back to the page > > > > > > > pool, > > > > > > > we don't get all the benefits of page pool but we end up with > > > > > > > simple > > > > > > > codes (no fallback for premapping). > > > > > > > > > > > > I am ok for this. > > > > > > > > > > Right, we just need to make sure there's no performance regression, > > > > > then it would be fine. > > > > > > > > > > I see for example mana did this as well. > > > > > > > > I think we should not use page pool directly now, > > > > because the mana does not need a space to store the dma address. > > > > We need to store the dma address for unmapping. > > > > > > > > If we use page pool without PP_FLAG_DMA_MAP, then store the dma address > > > > by > > > > page.dma_addr, I think that is not safe. > > > > > > Jesper, could you comment on this? > > > > > > > > > > > I think the way of this patch set is fine. > > > > > > So it reuses page pool structure in the page structure for another use > > > case. > > > > > > > We just use the > > > > space of the page whatever it is page pool or not to store > > > > the link and dma address. > > > > > > Probably because we've already "abused" page->private. I would leave > > > it for other maintainers to decide. > > > > If we do not want to use the elements of the page directly, > > the page pool is a good way. > > > > Rethink this, I think that the approach of this series should be fine. > It should be sufficient for virtio-net to guarantee that those pages > are not used by the page pool. > > I will continue the review.
Btw, it would be better to describe those design considerations in the changelog (e.g why we don't use page pool etc). Thanks > > Thanks