On Fri, May 19, 2023 at 11:33 AM Xuan Zhuo <xuanz...@linux.alibaba.com> wrote:
>
> On Thu, 18 May 2023 13:12:49 -0400, "Michael S. Tsirkin" <m...@redhat.com> 
> wrote:
> > On Thu, May 18, 2023 at 08:22:14PM +0800, Xuan Zhuo wrote:
> > > On Thu, 18 May 2023 05:49:46 -0400, "Michael S. Tsirkin" 
> > > <m...@redhat.com> wrote:
> > > > On Thu, May 18, 2023 at 05:14:03PM +0800, Xuan Zhuo wrote:
> > > > > On Thu, 18 May 2023 16:57:37 +0800, Jason Wang <jasow...@redhat.com> 
> > > > > wrote:
> > > > > > On Thu, May 18, 2023 at 4:29 PM Michael S. Tsirkin 
> > > > > > <m...@redhat.com> wrote:
> > > > > > >
> > > > > > > On Thu, May 18, 2023 at 03:33:52PM +0800, Xuan Zhuo wrote:
> > > > > > > > On Thu, 18 May 2023 03:11:25 -0400, "Michael S. Tsirkin" 
> > > > > > > > <m...@redhat.com> wrote:
> > > > > > > > > On Thu, May 18, 2023 at 02:51:57PM +0800, Jason Wang wrote:
> > > > > > > > > > On Wed, May 17, 2023 at 10:23 AM Xuan Zhuo 
> > > > > > > > > > <xuanz...@linux.alibaba.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > virtqueue_add_split() only supports virtual addresses, 
> > > > > > > > > > > dma is completed
> > > > > > > > > > > in virtqueue_add_split().
> > > > > > > > > > >
> > > > > > > > > > > In some scenarios (such as the AF_XDP scenario), the 
> > > > > > > > > > > memory is allocated
> > > > > > > > > > > and DMA is completed in advance, so it is necessary for 
> > > > > > > > > > > us to support
> > > > > > > > > > > passing the DMA address to virtqueue_add_split().
> > > > > > > > > > >
> > > > > > > > > > > Record this information in desc_state, we can skip unmap 
> > > > > > > > > > > based on this
> > > > > > > > > > > when executing dma unmap.
> > > > > > > > > >
> > > > > > > > > > I would also suggest documenting why a per descriptor 
> > > > > > > > > > metadata is
> > > > > > > > > > needed instead of a per virtqueue one.
> > > > > > > > >
> > > > > > > > > I think we could make it per virtqueue. That would mean all 
> > > > > > > > > code in
> > > > > > > > > virtio net would have to change to do dma mapping itself 
> > > > > > > > > instead of
> > > > > > > > > relying on virtio core though.  Which is maybe a good idea? 
> > > > > > > > > Definitely a
> > > > > > > > > very intrusive change though, will need a lot of performance 
> > > > > > > > > testing
> > > > > > > > > to make sure we don't break anything.
> > > > > > > >
> > > > > > > > In fact, we have tried this idea.
> > > > > > > >
> > > > > > > > The problem is the detach and unmap.
> > > > > > > >
> > > > > > > > We need to get all DMA Addresses from virtio-ring to unmap. 
> > > > > > > > Currently, it does
> > > > > > > > not support to return the DMA Address, and for SKB, we need to 
> > > > > > > > get multiple DMA
> > > > > > > > Addresses at one time.
> > > > > > > >
> > > > > > > > This need to modify the logic of Virtio-Ring detach. Besides 
> > > > > > > > this, I also agree
> > > > > > > > with this idea.
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > >
> > > > > > > Well you can have a version of get_buf that returns them ... but
> > > > > > > it is not clear to me all this is worth it unless you want
> > > > > > > to do unsafe tricks like leaving them mapped.
> > > > > >
> > > > > > Some high speed NIC drivers use this trick for better performance.
> > > > >
> > > > >
> > > > > Interesting, this is the first time I know this. Is there any problem?
> > > >
> > > > depends - if you are relying on the IOMMU then yes - malicious hardware
> > > > can steal guest secrets or corrupt memory since it's a hack not properly
> > > > integrated with linux and there's no real control preventing linux from
> > > > reusing this memory for something unrelated.
> > > > If instead you are using something like bounce buffers then no, but OTOH
> > > > bounce buffers are already expensive so you might not see a lot
> > > > of benefit.
> > > >
> > > > > So, is that virtio-net master the operation of dma by itself the 
> > > > > right way?
> > > > >
> > > > > Thanks
> > > >
> > > > I am fine with the approach taken for now. And look at reducing
> > > > cost of dma map/unmap later.
> > >
> > > Well, so far, we have discussed various situations, in order to on the 
> > > same
> > > page. Let's summarize.
> > >
> > > 1. premapped for pre-virtqueue
> > >
> > >    We don't always need to check premmapped, and we can try to merge the
> > >    premapped flag with use_dma_api so that we do not need to check more 
> > > flags.
> > >
> > >    We can switch premapped when vq reset, so I don't think we have to 
> > > open it by
> > >    default. And when supporting af-xdp, there must do vq reset.
> >
> > Sounds attractive but I didn't realize AF_XDP blocks regular xdp.
>
> Sorry, I do not understand your mean.
>
> AF_XDP depends on XDP when it runs, and both must be bound to the nic at the
> same time to work normally. The data packet is redirected by xdp to af-xdp.
>
> Since af-xdp needs to refill vq, we need to reset vq to release the buffer 
> that
> has been filled into vq.
>
> My idea is that we can turn on the premmapped function while executing vq 
> reset.

That should be fine, and it helps to avoid per descriptor metadata.

Thanks

>
> Thanks.
>
>
> > Is that true?
> >
> > > 2. premapped for pre-desc(state)
> > >     * save the flag inside the extra->flags
> > >
> > > OK, let me know you want which one.
> > >
> > > If I miss something, please point out.
> > >
> > > Thanks
> > >
> > >
> > >
> > >
> > >
> > > >
> > > > >
> > > > >
> > > > > >
> > > > > > > I'd leave that
> > > > > > > for another day maybe.
> > > > > > >
> > > > > > > For marking desc as premapped I think we can use a bit from
> > > > > > > desc_extra->flags, either reusing one of NEXT,AVAIL,USED, or 
> > > > > > > stealing
> > > > > > > another one.
> > > > > >
> > > > > > Probably.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Xuan Zhuo <xuanz...@linux.alibaba.com>
> > > > > > > > > > > ---
> > > > > > > > > > >  drivers/virtio/virtio_ring.c | 38 
> > > > > > > > > > > +++++++++++++++++++++++++++---------
> > > > > > > > > > >  1 file changed, 29 insertions(+), 9 deletions(-)
> > > > > > > > > > >
> > > > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c 
> > > > > > > > > > > b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > index e2fc50c05bec..bd5e84afab37 100644
> > > > > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > @@ -70,6 +70,7 @@
> > > > > > > > > > >  struct vring_desc_state_split {
> > > > > > > > > > >         void *data;                     /* Data for 
> > > > > > > > > > > callback. */
> > > > > > > > > > >         struct vring_desc *indir_desc;  /* Indirect 
> > > > > > > > > > > descriptor, if any. */
> > > > > > > > > > > +       bool premapped;                 /* DMA mapping is 
> > > > > > > > > > > done by driver. */
> > > > > > > > > >
> > > > > > > > > > Going back to the original discussion around where this 
> > > > > > > > > > should be
> > > > > > > > > > placed. I wonder if we can find a common place to store 
> > > > > > > > > > this since it
> > > > > > > > > > has nothing related to virtqueue layout. Maybe desc_extra? 
> > > > > > > > > > And it
> > > > > > > > > > would be even better if we can avoid stressing the cache 
> > > > > > > > > > like above.
> > > > > > > > > >
> > > > > > > > > > >  };
> > > > > > > > > > >
> > > > > > > > > > >  struct vring_desc_state_packed {
> > > > > > > > > > > @@ -356,8 +357,14 @@ static struct device 
> > > > > > > > > > > *vring_dma_dev(const struct vring_virtqueue *vq)
> > > > > > > > > > >
> > > > > > > > > > >  /* Map one sg entry. */
> > > > > > > > > > >  static int vring_map_one_sg(const struct vring_virtqueue 
> > > > > > > > > > > *vq, struct scatterlist *sg,
> > > > > > > > > > > -                           enum dma_data_direction 
> > > > > > > > > > > direction, static dma_addr_t *addr)
> > > > > > > > > > > +                           enum dma_data_direction 
> > > > > > > > > > > direction,
> > > > > > > > > > > +                           bool premapped, dma_addr_t 
> > > > > > > > > > > *addr)
> > > > > > > > > >
> > > > > > > > > > having things like:
> > > > > > > > > >
> > > > > > > > > > int func(bool do)
> > > > > > > > > > {
> > > > > > > > > > if (!do)
> > > > > > > > > >     return;
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > is a hint that the check needs to be done by the caller?
> > > > > > > > > >
> > > > > > > > > > And this change should work for both packed and split. I 
> > > > > > > > > > think we need
> > > > > > > > > > to squash the packed changes here.
> > > > > > > > > >
> > > > > > > > > > Looking at how packed virtqueue uses this in this patch, I 
> > > > > > > > > > don't think
> > > > > > > > > > this patch can even be built. I will wait for a new version 
> > > > > > > > > > and
> > > > > > > > > > continue the review from there.
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >  {
> > > > > > > > > > > +       if (premapped) {
> > > > > > > > > > > +               *addr = sg_dma_address(sg);
> > > > > > > > > > > +               return 0;
> > > > > > > > > > > +       }
> > > > > > > > > > > +
> > > > > > > > > > >         if (!vq->use_dma_api) {
> > > > > > > > > > >                 /*
> > > > > > > > > > >                  * If DMA is not used, KMSAN doesn't know 
> > > > > > > > > > > that the scatterlist
> > > > > > > > > > > @@ -445,7 +452,7 @@ static void 
> > > > > > > > > > > vring_unmap_one_split_indirect(const struct 
> > > > > > > > > > > vring_virtqueue *vq,
> > > > > > > > > > >  }
> > > > > > > > > > >
> > > > > > > > > > >  static unsigned int vring_unmap_one_split(const struct 
> > > > > > > > > > > vring_virtqueue *vq,
> > > > > > > > > > > -                                         unsigned int i)
> > > > > > > > > > > +                                         unsigned int i, 
> > > > > > > > > > > bool premapped)
> > > > > > > > > > >  {
> > > > > > > > > > >         struct vring_desc_extra *extra = 
> > > > > > > > > > > vq->split.desc_extra;
> > > > > > > > > > >         u16 flags;
> > > > > > > > > > > @@ -462,6 +469,9 @@ static unsigned int 
> > > > > > > > > > > vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > > > > > > > > >                                  (flags & 
> > > > > > > > > > > VRING_DESC_F_WRITE) ?
> > > > > > > > > > >                                  DMA_FROM_DEVICE : 
> > > > > > > > > > > DMA_TO_DEVICE);
> > > > > > > > > > >         } else {
> > > > > > > > > > > +               if (premapped)
> > > > > > > > > > > +                       goto out;
> > > > > > > > > > > +
> > > > > > > > > > >                 dma_unmap_page(vring_dma_dev(vq),
> > > > > > > > > > >                                extra[i].addr,
> > > > > > > > > > >                                extra[i].len,
> > > > > > > > > > > @@ -532,6 +542,7 @@ static inline int 
> > > > > > > > > > > virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > > > > >                                       unsigned int in_sgs,
> > > > > > > > > > >                                       void *data,
> > > > > > > > > > >                                       void *ctx,
> > > > > > > > > > > +                                     bool premapped,
> > > > > > > > > > >                                       gfp_t gfp)
> > > > > > > > > > >  {
> > > > > > > > > > >         struct vring_virtqueue *vq = to_vvq(_vq);
> > > > > > > > > > > @@ -595,7 +606,7 @@ static inline int 
> > > > > > > > > > > virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > > > > >                 for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > > > > > > > >                         dma_addr_t addr;
> > > > > > > > > > >
> > > > > > > > > > > -                       if (vring_map_one_sg(vq, sg, 
> > > > > > > > > > > DMA_TO_DEVICE, &addr))
> > > > > > > > > > > +                       if (vring_map_one_sg(vq, sg, 
> > > > > > > > > > > DMA_TO_DEVICE, premapped, &addr))
> > > > > > > > > > >                                 goto unmap_release;
> > > > > > > > > > >
> > > > > > > > > > >                         prev = i;
> > > > > > > > > > > @@ -611,7 +622,7 @@ static inline int 
> > > > > > > > > > > virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > > > > >                 for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > > > > > > > >                         dma_addr_t addr;
> > > > > > > > > > >
> > > > > > > > > > > -                       if (vring_map_one_sg(vq, sg, 
> > > > > > > > > > > DMA_FROM_DEVICE, &addr))
> > > > > > > > > > > +                       if (vring_map_one_sg(vq, sg, 
> > > > > > > > > > > DMA_FROM_DEVICE, premapped, &addr))
> > > > > > > > > > >                                 goto unmap_release;
> > > > > > > > > > >
> > > > > > > > > > >                         prev = i;
> > > > > > > > > > > @@ -657,6 +668,7 @@ static inline int 
> > > > > > > > > > > virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > > > > >
> > > > > > > > > > >         /* Store token and indirect buffer state. */
> > > > > > > > > > >         vq->split.desc_state[head].data = data;
> > > > > > > > > > > +       vq->split.desc_state[head].premapped = premapped;
> > > > > > > > > > >         if (indirect)
> > > > > > > > > > >                 vq->split.desc_state[head].indir_desc = 
> > > > > > > > > > > desc;
> > > > > > > > > > >         else
> > > > > > > > > > > @@ -686,6 +698,14 @@ static inline int 
> > > > > > > > > > > virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > > > > >         return 0;
> > > > > > > > > > >
> > > > > > > > > > >  unmap_release:
> > > > > > > > > > > +       if (premapped) {
> > > > > > > > > > > +               if (indirect)
> > > > > > > > > > > +                       kfree(desc);
> > > > > > > > > > > +
> > > > > > > > > > > +               END_USE(vq);
> > > > > > > > > > > +               return -ENOMEM;
> > > > > > > > > > > +       }
> > > > > > > > > > > +
> > > > > > > > > > >         err_idx = i;
> > > > > > > > > > >
> > > > > > > > > > >         if (indirect)
> > > > > > > > > > > @@ -700,7 +720,7 @@ static inline int 
> > > > > > > > > > > virtqueue_add_split(struct virtqueue *_vq,
> > > > > > > > > > >                         
> > > > > > > > > > > vring_unmap_one_split_indirect(vq, &desc[i]);
> > > > > > > > > > >                         i = virtio16_to_cpu(_vq->vdev, 
> > > > > > > > > > > desc[i].next);
> > > > > > > > > > >                 } else
> > > > > > > > > > > -                       i = vring_unmap_one_split(vq, i);
> > > > > > > > > > > +                       i = vring_unmap_one_split(vq, i, 
> > > > > > > > > > > false);
> > > > > > > > > > >         }
> > > > > > > > > > >
> > > > > > > > > > >         if (indirect)
> > > > > > > > > > > @@ -757,12 +777,12 @@ static void detach_buf_split(struct 
> > > > > > > > > > > vring_virtqueue *vq, unsigned int head,
> > > > > > > > > > >         i = head;
> > > > > > > > > > >
> > > > > > > > > > >         while (vq->split.vring.desc[i].flags & nextflag) {
> > > > > > > > > > > -               vring_unmap_one_split(vq, i);
> > > > > > > > > > > +               vring_unmap_one_split(vq, i, 
> > > > > > > > > > > state->premapped);
> > > > > > > > > > >                 i = vq->split.desc_extra[i].next;
> > > > > > > > > > >                 vq->vq.num_free++;
> > > > > > > > > > >         }
> > > > > > > > > > >
> > > > > > > > > > > -       vring_unmap_one_split(vq, i);
> > > > > > > > > > > +       vring_unmap_one_split(vq, i, state->premapped);
> > > > > > > > > > >         vq->split.desc_extra[i].next = vq->free_head;
> > > > > > > > > > >         vq->free_head = head;
> > > > > > > > > > >
> > > > > > > > > > > @@ -783,7 +803,7 @@ static void detach_buf_split(struct 
> > > > > > > > > > > vring_virtqueue *vq, unsigned int head,
> > > > > > > > > > >                                 VRING_DESC_F_INDIRECT));
> > > > > > > > > > >                 BUG_ON(len == 0 || len % sizeof(struct 
> > > > > > > > > > > vring_desc));
> > > > > > > > > > >
> > > > > > > > > > > -               if (vq->use_dma_api) {
> > > > > > > > > > > +               if (vq->use_dma_api && !state->premapped) 
> > > > > > > > > > > {
> > > > > > > > > > >                         for (j = 0; j < len / 
> > > > > > > > > > > sizeof(struct vring_desc); j++)
> > > > > > > > > > >                                 
> > > > > > > > > > > vring_unmap_one_split_indirect(vq, &indir_desc[j]);
> > > > > > > > > > >                 }
> > > > > > > > > > > @@ -2143,7 +2163,7 @@ static inline int 
> > > > > > > > > > > virtqueue_add(struct virtqueue *_vq,
> > > > > > > > > > >         return vq->packed_ring ? 
> > > > > > > > > > > virtqueue_add_packed(_vq, sgs, total_sg,
> > > > > > > > > > >                                         out_sgs, in_sgs, 
> > > > > > > > > > > data, ctx, gfp) :
> > > > > > > > > > >                                  virtqueue_add_split(_vq, 
> > > > > > > > > > > sgs, total_sg,
> > > > > > > > > > > -                                       out_sgs, in_sgs, 
> > > > > > > > > > > data, ctx, gfp);
> > > > > > > > > > > +                                       out_sgs, in_sgs, 
> > > > > > > > > > > data, ctx, premapped, gfp);
> > > > > > > > > > >  }
> > > > > > > > > > >
> > > > > > > > > > >  /**
> > > > > > > > > > > --
> > > > > > > > > > > 2.32.0.3.g01195cf9f
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > >
> >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Reply via email to