Re: [PATCH 17/31] vdpa: adapt vhost_ops callbacks to svq

2022-02-22 Thread Jason Wang
On Tue, Feb 22, 2022 at 3:43 PM Eugenio Perez Martin
 wrote:
>
> On Tue, Feb 22, 2022 at 4:16 AM Jason Wang  wrote:
> >
> > On Tue, Feb 22, 2022 at 1:23 AM Eugenio Perez Martin
> >  wrote:
> > >
> > > On Mon, Feb 21, 2022 at 8:15 AM Jason Wang  wrote:
> > > >
> > > >
> > > > 在 2022/2/18 上午1:13, Eugenio Perez Martin 写道:
> > > > > On Tue, Feb 8, 2022 at 4:58 AM Jason Wang  wrote:
> > > > >>
> > > > >> 在 2022/2/1 上午2:58, Eugenio Perez Martin 写道:
> > > > >>> On Sun, Jan 30, 2022 at 5:03 AM Jason Wang  
> > > > >>> wrote:
> > > >  在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > > > > First half of the buffers forwarding part, preparing vhost-vdpa
> > > > > callbacks to SVQ to offer it. QEMU cannot enable it at this 
> > > > > moment, so
> > > > > this is effectively dead code at the moment, but it helps to 
> > > > > reduce
> > > > > patch size.
> > > > >
> > > > > Signed-off-by: Eugenio Pérez 
> > > > > ---
> > > > > hw/virtio/vhost-shadow-virtqueue.h |   2 +-
> > > > > hw/virtio/vhost-shadow-virtqueue.c |  21 -
> > > > > hw/virtio/vhost-vdpa.c | 133 
> > > > > ++---
> > > > > 3 files changed, 143 insertions(+), 13 deletions(-)
> > > > >
> > > > > diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
> > > > > b/hw/virtio/vhost-shadow-virtqueue.h
> > > > > index 035207a469..39aef5ffdf 100644
> > > > > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > > > > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > > > > @@ -35,7 +35,7 @@ size_t vhost_svq_device_area_size(const 
> > > > > VhostShadowVirtqueue *svq);
> > > > >
> > > > > void vhost_svq_stop(VhostShadowVirtqueue *svq);
> > > > >
> > > > > -VhostShadowVirtqueue *vhost_svq_new(void);
> > > > > +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
> > > > >
> > > > > void vhost_svq_free(VhostShadowVirtqueue *vq);
> > > > >
> > > > > diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
> > > > > b/hw/virtio/vhost-shadow-virtqueue.c
> > > > > index f129ec8395..7c168075d7 100644
> > > > > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > > > > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > > > > @@ -277,9 +277,17 @@ void vhost_svq_stop(VhostShadowVirtqueue 
> > > > > *svq)
> > > > > /**
> > > > >  * Creates vhost shadow virtqueue, and instruct vhost device 
> > > > > to use the shadow
> > > > >  * methods and file descriptors.
> > > > > + *
> > > > > + * @qsize Shadow VirtQueue size
> > > > > + *
> > > > > + * Returns the new virtqueue or NULL.
> > > > > + *
> > > > > + * In case of error, reason is reported through error_report.
> > > > >  */
> > > > > -VhostShadowVirtqueue *vhost_svq_new(void)
> > > > > +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
> > > > > {
> > > > > +size_t desc_size = sizeof(vring_desc_t) * qsize;
> > > > > +size_t device_size, driver_size;
> > > > > g_autofree VhostShadowVirtqueue *svq = 
> > > > > g_new0(VhostShadowVirtqueue, 1);
> > > > > int r;
> > > > >
> > > > > @@ -300,6 +308,15 @@ VhostShadowVirtqueue *vhost_svq_new(void)
> > > > > /* Placeholder descriptor, it should be deleted at 
> > > > > set_kick_fd */
> > > > > event_notifier_init_fd(&svq->svq_kick, 
> > > > > INVALID_SVQ_KICK_FD);
> > > > >
> > > > > +svq->vring.num = qsize;
> > > >  I wonder if this is the best. E.g some hardware can support up to 
> > > >  32K
> > > >  queue size. So this will probably end up with:
> > > > 
> > > >  1) SVQ use 32K queue size
> > > >  2) hardware queue uses 256
> > > > 
> > > > >>> In that case SVQ vring queue size will be 32K and guest's vring can
> > > > >>> negotiate any number with SVQ equal or less than 32K,
> > > > >>
> > > > >> Sorry for being unclear what I meant is actually
> > > > >>
> > > > >> 1) SVQ uses 32K queue size
> > > > >>
> > > > >> 2) guest vq uses 256
> > > > >>
> > > > >> This looks like a burden that needs extra logic and may damage the
> > > > >> performance.
> > > > >>
> > > > > Still not getting this point.
> > > > >
> > > > > An available guest buffer, although contiguous in GPA/GVA, can expand
> > > > > in multiple buffers if it's not contiguous in qemu's VA (by the while
> > > > > loop in virtqueue_map_desc [1]). In that scenario it is better to have
> > > > > "plenty" of SVQ buffers.
> > > >
> > > >
> > > > Yes, but this case should be rare. So in this case we should deal with
> > > > overrun on SVQ, that is
> > > >
> > > > 1) SVQ is full
> > > > 2) guest VQ isn't
> > > >
> > > > We need to
> > > >
> > > > 1) check the available buffer slots
> > > > 2) disable guest kick and wait for the used buffers
> > > >
> > > > But it looks to me the current code is not ready for dealing with this 
> > > > case?
> > > >
> > >
> > > Yes i

Re: [PATCH v1 0/6] virtio: support advance DMA

2022-02-22 Thread Xuan Zhuo
On Tue, 22 Feb 2022 12:02:14 +0800, Jason Wang  wrote:
>
> 在 2022/2/21 下午7:23, Xuan Zhuo 写道:
> > On Mon, 21 Feb 2022 11:32:52 +0800, Jason Wang  wrote:
> >> On Fri, Feb 18, 2022 at 5:00 PM Xuan Zhuo  
> >> wrote:
> >>> On Thu, 17 Feb 2022 15:19:44 +0800, Jason Wang  
> >>> wrote:
>  On Thu, Feb 10, 2022 at 4:51 PM Xuan Zhuo  
>  wrote:
> > virtqueue_add() only supports virtual addresses, dma is completed in
> > virtqueue_add().
> >
> > In some scenarios (such as the AF_XDP scenario), DMA is completed in 
> > advance, so
> > it is necessary for us to support passing the DMA address to 
> > virtqueue_add().
>  I'd suggest rename this feature as "unmanaged DMA".
> >>> OK
> >>>
> > Record this predma information in extra->flags, which can be skipped 
> > when
> > executing dma unmap.
>  Question still, can we use per-virtqueue flag instead of per
>  descriptor flag? If my memory is correct, the answer is yes in the
>  discussion for the previous version.
> 
> >>> Yes.
> >>>
> >>> per-virtqueue? I guess it should be per-submit.
> >>>
> >>> This patch set only adds a flag to desc_extra[head].flags, so that we can 
> >>> know
> >>> if we need to unmap dma when we detach.
> >> I meant if we can manage to make it per virtqueue, there's no need to
> >> maintain per buffer flag.
> >>
> > Rethinking this question, I feel there is no essential difference between 
> > per
> > virtqueue and per sgs.
> >
> > per virtqueue:
> > 1. add buf:
> > a. check vq->premapped for map every sg
> > 2. detach:
> > a. check vq->premaped for unmap
> >
> > per sgs:
> > 1. add buf:
> > a. check function parameter "premapped" for map every sg
> > b. add flag to extra[head].flag
> >
> > 2. detach:
> > a: check extra[head].flag for unmap
> >
> >
> > Thanks.
>
>
> Per-virtqueue is still a little bit easier at the first glance.
>
> Actually, per-sg have one advantage: it can be used without virtqueue
> reset (to allow switching between the two modes). But I'm not sure
> whether we had such requirements.
>
> I think to answer this question, we probably need a real use case (if we
> can come up with a case that is more lightweight than AF_XDP, that would
> be even better).

Sadly, I didn't think of other scenarios. Hope someone can give a scenario.

For per virtqueue, virtio-net will also switch to premapped. Because the tx
queue is shared.

But in the process of implementing this, I encountered a troublesome problem. We
need to record the dma address in virtnet. For tx, since skb contains multiple
frags, there will be many dma addresses. When unmap in virtnet It will be more
troublesome. Because we have to regain these dma addresses.

I think of two ways:

1. Let virtio return the addr of each desc when detached.
2. Allocate a block of memory for each sq/rq to hold the dma address.

Thanks.

>
> Thanks
>
>
> >
> >
> >> So we know something that needs to be mapped by virtio core itself,
> >> e.g the indirect page. Other than this, all the rest could be
> >> pre-mapped.
> >>
> >> For vnet header, it could be mapped by virtio-net which could be still
> >> treated as pre mapped DMA since it's not the virtio ring code.
> >>
> >> Anything I miss here?
> >>
> >> Thanks
> >>
> >>
> >>> Thanks.
> >>>
>  Thanks
> 
> > v1:
> > 1. All sgs requested at one time are required to be unified PREDMA, 
> > and several
> >of them are not supported to be PREDMA
> > 2. virtio_dma_map() is removed from this patch set and will be 
> > submitted
> >together with the next time AF_XDP supports virtio dma
> > 3. Added patch #2 #3 to remove the check for flags when performing 
> > unmap
> >indirect desc
> >
> > Xuan Zhuo (6):
> >virtio: rename vring_unmap_state_packed() to
> >  vring_unmap_extra_packed()
> >virtio: remove flags check for unmap split indirect desc
> >virtio: remove flags check for unmap packed indirect desc
> >virtio: virtqueue_add() support predma
> >virtio: split: virtqueue_add_split() support dma address
> >virtio: packed: virtqueue_add_packed() support dma address
> >
> >   drivers/virtio/virtio_ring.c | 199 ++-
> >   1 file changed, 126 insertions(+), 73 deletions(-)
> >
> > --
> > 2.31.0
> >
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] vhost/vsock: don't check owner in vhost_vsock_stop() while releasing

2022-02-22 Thread Stefano Garzarella

On Tue, Feb 22, 2022 at 08:30:17AM +0300, Dan Carpenter wrote:

Hi Stefano,

url:
https://github.com/0day-ci/linux/commits/Stefano-Garzarella/vhost-vsock-don-t-check-owner-in-vhost_vsock_stop-while-releasing/20220221-195038
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
config: x86_64-randconfig-m031-20220221 
(https://download.01.org/0day-ci/archive/20220222/202202220707.am3rkucp-...@intel.com/config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 
Reported-by: Dan Carpenter 

smatch warnings:
drivers/vhost/vsock.c:655 vhost_vsock_stop() error: uninitialized symbol 'ret'.

vim +/ret +655 drivers/vhost/vsock.c

3ace84c91bfcde Stefano Garzarella 2022-02-21  632  static int 
vhost_vsock_stop(struct vhost_vsock *vsock, bool check_owner)
433fc58e6bf2c8 Asias He   2016-07-28  633  {
433fc58e6bf2c8 Asias He   2016-07-28  634   size_t i;
433fc58e6bf2c8 Asias He   2016-07-28  635   int ret;
433fc58e6bf2c8 Asias He   2016-07-28  636
433fc58e6bf2c8 Asias He   2016-07-28  637   
mutex_lock(&vsock->dev.mutex);
433fc58e6bf2c8 Asias He   2016-07-28  638
3ace84c91bfcde Stefano Garzarella 2022-02-21  639   if (check_owner) {
433fc58e6bf2c8 Asias He   2016-07-28  640   ret = 
vhost_dev_check_owner(&vsock->dev);
433fc58e6bf2c8 Asias He   2016-07-28  641   if (ret)
433fc58e6bf2c8 Asias He   2016-07-28  642   goto 
err;
3ace84c91bfcde Stefano Garzarella 2022-02-21  643   }

"ret" not initialized on else path.


Oooops, I was testing with vhost_vsock_dev_release() where we don't 
check the ret value, but of course we need to initialize it to 0 for the 
vhost_vsock_dev_ioctl() use case.


I'll fix in the v2.

Thanks for the report,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vhost/vsock: don't check owner in vhost_vsock_stop() while releasing

2022-02-22 Thread Stefano Garzarella

On Tue, Feb 22, 2022 at 01:06:12AM +0530, Anirudh Rayabharam wrote:

On Mon, Feb 21, 2022 at 07:26:28PM +0100, Stefano Garzarella wrote:

On Mon, Feb 21, 2022 at 11:33:11PM +0530, Anirudh Rayabharam wrote:
> On Mon, Feb 21, 2022 at 05:44:20PM +0100, Stefano Garzarella wrote:
> > On Mon, Feb 21, 2022 at 09:44:39PM +0530, Anirudh Rayabharam wrote:
> > > On Mon, Feb 21, 2022 at 02:59:30PM +0100, Stefano Garzarella wrote:
> > > > On Mon, Feb 21, 2022 at 12:49 PM Stefano Garzarella 
 wrote:
> > > > >
> > > > > vhost_vsock_stop() calls vhost_dev_check_owner() to check the device
> > > > > ownership. It expects current->mm to be valid.
> > > > >
> > > > > vhost_vsock_stop() is also called by vhost_vsock_dev_release() when
> > > > > the user has not done close(), so when we are in do_exit(). In this
> > > > > case current->mm is invalid and we're releasing the device, so we
> > > > > should clean it anyway.
> > > > >
> > > > > Let's check the owner only when vhost_vsock_stop() is called
> > > > > by an ioctl.
> > > > >
> > > > > Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko")
> > > > > Cc: sta...@vger.kernel.org
> > > > > Reported-by: syzbot+1e3ea63db39f2b444...@syzkaller.appspotmail.com
> > > > > Signed-off-by: Stefano Garzarella 
> > > > > ---
> > > > >  drivers/vhost/vsock.c | 14 --
> > > > >  1 file changed, 8 insertions(+), 6 deletions(-)
> > > >
> > > > Reported-and-tested-by: 
syzbot+0abd373e2e50d704d...@syzkaller.appspotmail.com
> > >
> > > I don't think this patch fixes "INFO: task hung in vhost_work_dev_flush"
> > > even though syzbot says so. I am able to reproduce the issue locally
> > > even with this patch applied.
> >
> > Are you using the sysbot reproducer or another test?
> > In that case, can you share it?
>
> I am using the syzbot reproducer.
>
> >
> > From the stack trace it seemed to me that the worker accesses a zone that
> > has been cleaned (iotlb), so it is invalid and fails.
>
> Would the thread hang in that case? How?

Looking at this log [1] it seems that the process is blocked on the
wait_for_completion() in vhost_work_dev_flush().

Since we're not setting the backend to NULL to stop the worker, it's likely
that the worker will keep running, preventing the flush work from
completing.


The log shows that the worker thread is stuck in iotlb_access_ok(). How
will setting the backend to NULL stop it? During my debugging I found
that the worker is stuck in this while loop:


Okay, looking at your new patch, now I see. If we enter in this loop 
before setting the backend to NULL and we have start = 0 and end = (u64) 
-1 , we should be there forever.


I'll remove that tag in v2, but the test might fail without this patch 
applied, because for now we don't stop workers correctly.


Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2] vhost/vsock: don't check owner in vhost_vsock_stop() while releasing

2022-02-22 Thread Stefano Garzarella
vhost_vsock_stop() calls vhost_dev_check_owner() to check the device
ownership. It expects current->mm to be valid.

vhost_vsock_stop() is also called by vhost_vsock_dev_release() when
the user has not done close(), so when we are in do_exit(). In this
case current->mm is invalid and we're releasing the device, so we
should clean it anyway.

Let's check the owner only when vhost_vsock_stop() is called
by an ioctl.

When invoked from release we can not fail so we don't check return
code of vhost_vsock_stop(). We need to stop vsock even if it's not
the owner.

Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko")
Cc: sta...@vger.kernel.org
Reported-by: syzbot+1e3ea63db39f2b444...@syzkaller.appspotmail.com
Reported-and-tested-by: syzbot+3140b17cb44a7b174...@syzkaller.appspotmail.com
Signed-off-by: Stefano Garzarella 
---
v2:
- initialized `ret` in vhost_vsock_stop [Dan]
- added comment about vhost_vsock_stop() calling in the code and an explanation
  in the commit message [MST]

v1: 
https://lore.kernel.org/virtualization/20220221114916.107045-1-sgarz...@redhat.com
---
 drivers/vhost/vsock.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index d6ca1c7ad513..37f0b4274113 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -629,16 +629,18 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
return ret;
 }
 
-static int vhost_vsock_stop(struct vhost_vsock *vsock)
+static int vhost_vsock_stop(struct vhost_vsock *vsock, bool check_owner)
 {
size_t i;
-   int ret;
+   int ret = 0;
 
mutex_lock(&vsock->dev.mutex);
 
-   ret = vhost_dev_check_owner(&vsock->dev);
-   if (ret)
-   goto err;
+   if (check_owner) {
+   ret = vhost_dev_check_owner(&vsock->dev);
+   if (ret)
+   goto err;
+   }
 
for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) {
struct vhost_virtqueue *vq = &vsock->vqs[i];
@@ -753,7 +755,12 @@ static int vhost_vsock_dev_release(struct inode *inode, 
struct file *file)
 * inefficient.  Room for improvement here. */
vsock_for_each_connected_socket(vhost_vsock_reset_orphans);
 
-   vhost_vsock_stop(vsock);
+   /* Don't check the owner, because we are in the release path, so we
+* need to stop the vsock device in any case.
+* vhost_vsock_stop() can not fail in this case, so we don't need to
+* check the return code.
+*/
+   vhost_vsock_stop(vsock, false);
vhost_vsock_flush(vsock);
vhost_dev_stop(&vsock->dev);
 
@@ -868,7 +875,7 @@ static long vhost_vsock_dev_ioctl(struct file *f, unsigned 
int ioctl,
if (start)
return vhost_vsock_start(vsock);
else
-   return vhost_vsock_stop(vsock);
+   return vhost_vsock_stop(vsock, true);
case VHOST_GET_FEATURES:
features = VHOST_VSOCK_FEATURES;
if (copy_to_user(argp, &features, sizeof(features)))
-- 
2.35.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [syzbot] INFO: task hung in vhost_work_dev_flush

2022-02-22 Thread Dan Carpenter
On Mon, Feb 21, 2022 at 09:23:04PM +0530, Anirudh Rayabharam wrote:
> On Mon, Feb 21, 2022 at 03:12:33PM +0100, Stefano Garzarella wrote:
> > #syz test: 
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
> > f71077a4d84b
> > 
> > Patch sent upstream:
> > https://lore.kernel.org/virtualization/20220221114916.107045-1-sgarz...@redhat.com/T/#u
> 
> I don't see how your patch fixes this issue. It looks unrelated. It is
> surprising that syzbot is happy with it.
> 
> I have sent a patch for this issue here:
> https://lore.kernel.org/lkml/20220221072852.31820-1-m...@anirudhrb.com/

I wasted so much time trying to figure out what this patch fixes.  :P

(It doesn't fix anything).

regards,
dan carpenter

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 0/4] RSS support for VirtioNet.

2022-02-22 Thread Andrew Melnychenko
Virtio-net supports "hardware" RSS with toeplitz key.
Also, it allows receiving calculated hash in vheader
that may be used with RPS.
Added ethtools callbacks to manipulate RSS.

Technically hash calculation may be set only for
SRC+DST and SRC+DST+PORTSRC+PORTDST hashflows.
The completely disabling hash calculation for TCP or UDP
would disable hash calculation for IP.

RSS/RXHASH is disabled by default.

Changes since v3:
* Moved hunks a bit.
* Added indirection table zero size check
  for hash report only feature.
* Added virtio_skb_set_hash() helper instead of in-place routine.

Changes since v2:
* Fixed issue with calculating padded header length.
  During review/tests, there was found an issue that
  will crash the kernel if VIRTIO_NET_F_MRG_RXBUF
  was not set. (thx to Jason Wang )
* Refactored the code according to review.

Changes since v1:
* Refactored virtnet_set_hashflow.
* Refactored virtio_net_ctrl_rss.
* Moved hunks between patches a bit.

Changes since rfc:
* Code refactored.
* Patches reformatted.
* Added feature validation.

Andrew Melnychenko (4):
  drivers/net/virtio_net: Fixed padded vheader to use v1 with hash.
  drivers/net/virtio_net: Added basic RSS support.
  drivers/net/virtio_net: Added RSS hash report.
  drivers/net/virtio_net: Added RSS hash report control.

 drivers/net/virtio_net.c | 389 +--
 1 file changed, 376 insertions(+), 13 deletions(-)

-- 
2.34.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 1/4] drivers/net/virtio_net: Fixed padded vheader to use v1 with hash.

2022-02-22 Thread Andrew Melnychenko
The header v1 provides additional info about RSS.
Added changes to computing proper header length.
In the next patches, the header may contain RSS hash info
for the hash population.

Signed-off-by: Andrew Melnychenko 
---
 drivers/net/virtio_net.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index a801ea40908f..b9ed7c55d9a0 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -242,13 +242,13 @@ struct virtnet_info {
 };
 
 struct padded_vnet_hdr {
-   struct virtio_net_hdr_mrg_rxbuf hdr;
+   struct virtio_net_hdr_v1_hash hdr;
/*
 * hdr is in a separate sg buffer, and data sg buffer shares same page
 * with this header sg. This padding makes next sg 16 byte aligned
 * after the header.
 */
-   char padding[4];
+   char padding[12];
 };
 
 static bool is_xdp_frame(void *ptr)
@@ -396,7 +396,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 
hdr_len = vi->hdr_len;
if (vi->mergeable_rx_bufs)
-   hdr_padded_len = sizeof(*hdr);
+   hdr_padded_len = hdr_len;
else
hdr_padded_len = sizeof(struct padded_vnet_hdr);
 
@@ -1266,7 +1266,8 @@ static unsigned int get_mergeable_buf_len(struct 
receive_queue *rq,
  struct ewma_pkt_len *avg_pkt_len,
  unsigned int room)
 {
-   const size_t hdr_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
+   struct virtnet_info *vi = rq->vq->vdev->priv;
+   const size_t hdr_len = vi->hdr_len;
unsigned int len;
 
if (room)
@@ -2851,7 +2852,7 @@ static void virtnet_del_vqs(struct virtnet_info *vi)
  */
 static unsigned int mergeable_min_buf_len(struct virtnet_info *vi, struct 
virtqueue *vq)
 {
-   const unsigned int hdr_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
+   const unsigned int hdr_len = vi->hdr_len;
unsigned int rq_size = virtqueue_get_vring_size(vq);
unsigned int packet_len = vi->big_packets ? IP_MAX_MTU : 
vi->dev->max_mtu;
unsigned int buf_len = hdr_len + ETH_HLEN + VLAN_HLEN + packet_len;
-- 
2.34.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 4/4] drivers/net/virtio_net: Added RSS hash report control.

2022-02-22 Thread Andrew Melnychenko
Now it's possible to control supported hashflows.
Added hashflow set/get callbacks.
Also, disabling RXH_IP_SRC/DST for TCP would disable then for UDP.
TCP and UDP supports only:
ethtool -U eth0 rx-flow-hash tcp4 sd
RXH_IP_SRC + RXH_IP_DST
ethtool -U eth0 rx-flow-hash tcp4 sdfn
RXH_IP_SRC + RXH_IP_DST + RXH_L4_B_0_1 + RXH_L4_B_2_3
Disabling happens because VirtioNET hashtype for IP doesn't check L4 proto,
it works for all IP packets(TCP, UDP, ICMP, etc.).
For TCP and UDP, it's possible to set IP+PORT hashes.
But disabling IP hashes will disable them for TCP and UDP simultaneously.
It's possible to set IP+PORT for TCP/UDP and disable/enable IP
for everything else(UDP, ICMP, etc.).

Signed-off-by: Andrew Melnychenko 
---
 drivers/net/virtio_net.c | 141 ++-
 1 file changed, 140 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 8b317b3ef5aa..83f9a36a5efb 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -231,6 +231,7 @@ struct virtnet_info {
u8 rss_key_size;
u16 rss_indir_table_size;
u32 rss_hash_types_supported;
+   u32 rss_hash_types_saved;
 
/* Has control virtqueue */
bool has_cvq;
@@ -2278,6 +2279,7 @@ static void virtnet_init_default_rss(struct virtnet_info 
*vi)
int i = 0;
 
vi->ctrl->rss.hash_types = vi->rss_hash_types_supported;
+   vi->rss_hash_types_saved = vi->rss_hash_types_supported;
vi->ctrl->rss.indirection_table_mask = vi->rss_indir_table_size
? vi->rss_indir_table_size - 1 
: 0;
vi->ctrl->rss.unclassified_queue = 0;
@@ -2293,6 +2295,121 @@ static void virtnet_init_default_rss(struct 
virtnet_info *vi)
netdev_rss_key_fill(vi->ctrl->rss.key, vi->rss_key_size);
 }
 
+static void virtnet_get_hashflow(const struct virtnet_info *vi, struct 
ethtool_rxnfc *info)
+{
+   info->data = 0;
+   switch (info->flow_type) {
+   case TCP_V4_FLOW:
+   if (vi->rss_hash_types_saved & VIRTIO_NET_RSS_HASH_TYPE_TCPv4) {
+   info->data = RXH_IP_SRC | RXH_IP_DST |
+RXH_L4_B_0_1 | RXH_L4_B_2_3;
+   } else if (vi->rss_hash_types_saved & 
VIRTIO_NET_RSS_HASH_TYPE_IPv4) {
+   info->data = RXH_IP_SRC | RXH_IP_DST;
+   }
+   break;
+   case TCP_V6_FLOW:
+   if (vi->rss_hash_types_saved & VIRTIO_NET_RSS_HASH_TYPE_TCPv6) {
+   info->data = RXH_IP_SRC | RXH_IP_DST |
+RXH_L4_B_0_1 | RXH_L4_B_2_3;
+   } else if (vi->rss_hash_types_saved & 
VIRTIO_NET_RSS_HASH_TYPE_IPv6) {
+   info->data = RXH_IP_SRC | RXH_IP_DST;
+   }
+   break;
+   case UDP_V4_FLOW:
+   if (vi->rss_hash_types_saved & VIRTIO_NET_RSS_HASH_TYPE_UDPv4) {
+   info->data = RXH_IP_SRC | RXH_IP_DST |
+RXH_L4_B_0_1 | RXH_L4_B_2_3;
+   } else if (vi->rss_hash_types_saved & 
VIRTIO_NET_RSS_HASH_TYPE_IPv4) {
+   info->data = RXH_IP_SRC | RXH_IP_DST;
+   }
+   break;
+   case UDP_V6_FLOW:
+   if (vi->rss_hash_types_saved & VIRTIO_NET_RSS_HASH_TYPE_UDPv6) {
+   info->data = RXH_IP_SRC | RXH_IP_DST |
+RXH_L4_B_0_1 | RXH_L4_B_2_3;
+   } else if (vi->rss_hash_types_saved & 
VIRTIO_NET_RSS_HASH_TYPE_IPv6) {
+   info->data = RXH_IP_SRC | RXH_IP_DST;
+   }
+   break;
+   case IPV4_FLOW:
+   if (vi->rss_hash_types_saved & VIRTIO_NET_RSS_HASH_TYPE_IPv4)
+   info->data = RXH_IP_SRC | RXH_IP_DST;
+
+   break;
+   case IPV6_FLOW:
+   if (vi->rss_hash_types_saved & VIRTIO_NET_RSS_HASH_TYPE_IPv6)
+   info->data = RXH_IP_SRC | RXH_IP_DST;
+
+   break;
+   default:
+   info->data = 0;
+   break;
+   }
+}
+
+static bool virtnet_set_hashflow(struct virtnet_info *vi, struct ethtool_rxnfc 
*info)
+{
+   u32 new_hashtypes = vi->rss_hash_types_saved;
+   bool is_disable = info->data & RXH_DISCARD;
+   bool is_l4 = info->data == (RXH_IP_SRC | RXH_IP_DST | RXH_L4_B_0_1 | 
RXH_L4_B_2_3);
+
+   /* supports only 'sd', 'sdfn' and 'r' */
+   if (!((info->data == (RXH_IP_SRC | RXH_IP_DST)) | is_l4 | is_disable))
+   return false;
+
+   switch (info->flow_type) {
+   case TCP_V4_FLOW:
+   new_hashtypes &= ~(VIRTIO_NET_RSS_HASH_TYPE_IPv4 | 
VIRTIO_NET_RSS_HASH_TYPE_TCPv4);
+   if (!is_disable)
+   new_hashtypes |= VIRTIO_NET_RSS_HASH_TYPE_IPv4
+   | (is_l4 ? VIRTI

[PATCH v4 2/4] drivers/net/virtio_net: Added basic RSS support.

2022-02-22 Thread Andrew Melnychenko
Added features for RSS.
Added initialization, RXHASH feature and ethtool ops.
By default RSS/RXHASH is disabled.
Virtio RSS "IPv6 extensions" hashes disabled.
Added ethtools ops to set key and indirection table.

Signed-off-by: Andrew Melnychenko 
---
 drivers/net/virtio_net.c | 192 +--
 1 file changed, 186 insertions(+), 6 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index b9ed7c55d9a0..b5f2bb426a7b 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -169,6 +169,24 @@ struct receive_queue {
struct xdp_rxq_info xdp_rxq;
 };
 
+/* This structure can contain rss message with maximum settings for 
indirection table and keysize
+ * Note, that default structure that describes RSS configuration 
virtio_net_rss_config
+ * contains same info but can't handle table values.
+ * In any case, structure would be passed to virtio hw through sg_buf split by 
parts
+ * because table sizes may be differ according to the device configuration.
+ */
+#define VIRTIO_NET_RSS_MAX_KEY_SIZE 40
+#define VIRTIO_NET_RSS_MAX_TABLE_LEN128
+struct virtio_net_ctrl_rss {
+   u32 hash_types;
+   u16 indirection_table_mask;
+   u16 unclassified_queue;
+   u16 indirection_table[VIRTIO_NET_RSS_MAX_TABLE_LEN];
+   u16 max_tx_vq;
+   u8 hash_key_length;
+   u8 key[VIRTIO_NET_RSS_MAX_KEY_SIZE];
+};
+
 /* Control VQ buffers: protected by the rtnl lock */
 struct control_buf {
struct virtio_net_ctrl_hdr hdr;
@@ -178,6 +196,7 @@ struct control_buf {
u8 allmulti;
__virtio16 vid;
__virtio64 offloads;
+   struct virtio_net_ctrl_rss rss;
 };
 
 struct virtnet_info {
@@ -206,6 +225,12 @@ struct virtnet_info {
/* Host will merge rx buffers for big packets (shake it! shake it!) */
bool mergeable_rx_bufs;
 
+   /* Host supports rss and/or hash report */
+   bool has_rss;
+   u8 rss_key_size;
+   u16 rss_indir_table_size;
+   u32 rss_hash_types_supported;
+
/* Has control virtqueue */
bool has_cvq;
 
@@ -2184,6 +2209,57 @@ static void virtnet_get_ringparam(struct net_device *dev,
ring->tx_pending = ring->tx_max_pending;
 }
 
+static bool virtnet_commit_rss_command(struct virtnet_info *vi)
+{
+   struct net_device *dev = vi->dev;
+   struct scatterlist sgs[4];
+   unsigned int sg_buf_size;
+
+   /* prepare sgs */
+   sg_init_table(sgs, 4);
+
+   sg_buf_size = offsetof(struct virtio_net_ctrl_rss, indirection_table);
+   sg_set_buf(&sgs[0], &vi->ctrl->rss, sg_buf_size);
+
+   sg_buf_size = sizeof(uint16_t) * (vi->ctrl->rss.indirection_table_mask 
+ 1);
+   sg_set_buf(&sgs[1], vi->ctrl->rss.indirection_table, sg_buf_size);
+
+   sg_buf_size = offsetof(struct virtio_net_ctrl_rss, key)
+   - offsetof(struct virtio_net_ctrl_rss, max_tx_vq);
+   sg_set_buf(&sgs[2], &vi->ctrl->rss.max_tx_vq, sg_buf_size);
+
+   sg_buf_size = vi->rss_key_size;
+   sg_set_buf(&sgs[3], vi->ctrl->rss.key, sg_buf_size);
+
+   if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_MQ,
+ VIRTIO_NET_CTRL_MQ_RSS_CONFIG, sgs)) {
+   dev_warn(&dev->dev, "VIRTIONET issue with committing RSS 
sgs\n");
+   return false;
+   }
+   return true;
+}
+
+static void virtnet_init_default_rss(struct virtnet_info *vi)
+{
+   u32 indir_val = 0;
+   int i = 0;
+
+   vi->ctrl->rss.hash_types = vi->rss_hash_types_supported;
+   vi->ctrl->rss.indirection_table_mask = vi->rss_indir_table_size
+   ? vi->rss_indir_table_size - 1 
: 0;
+   vi->ctrl->rss.unclassified_queue = 0;
+
+   for (; i < vi->rss_indir_table_size; ++i) {
+   indir_val = ethtool_rxfh_indir_default(i, vi->curr_queue_pairs);
+   vi->ctrl->rss.indirection_table[i] = indir_val;
+   }
+
+   vi->ctrl->rss.max_tx_vq = vi->curr_queue_pairs;
+   vi->ctrl->rss.hash_key_length = vi->rss_key_size;
+
+   netdev_rss_key_fill(vi->ctrl->rss.key, vi->rss_key_size);
+}
+
 
 static void virtnet_get_drvinfo(struct net_device *dev,
struct ethtool_drvinfo *info)
@@ -2412,6 +2488,71 @@ static void virtnet_update_settings(struct virtnet_info 
*vi)
vi->duplex = duplex;
 }
 
+static u32 virtnet_get_rxfh_key_size(struct net_device *dev)
+{
+   return ((struct virtnet_info *)netdev_priv(dev))->rss_key_size;
+}
+
+static u32 virtnet_get_rxfh_indir_size(struct net_device *dev)
+{
+   return ((struct virtnet_info *)netdev_priv(dev))->rss_indir_table_size;
+}
+
+static int virtnet_get_rxfh(struct net_device *dev, u32 *indir, u8 *key, u8 
*hfunc)
+{
+   struct virtnet_info *vi = netdev_priv(dev);
+   int i;
+
+   if (indir) {
+   for (i = 0; i < vi->rss_indir_table_size; ++i)
+   indir[i] = vi->ctrl->rss.indirection_t

[PATCH v4 3/4] drivers/net/virtio_net: Added RSS hash report.

2022-02-22 Thread Andrew Melnychenko
Added features for RSS hash report.
If hash is provided - it sets to skb.
Added checks if rss and/or hash are enabled together.

Signed-off-by: Andrew Melnychenko 
---
 drivers/net/virtio_net.c | 55 +++-
 1 file changed, 49 insertions(+), 6 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index b5f2bb426a7b..8b317b3ef5aa 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -227,6 +227,7 @@ struct virtnet_info {
 
/* Host supports rss and/or hash report */
bool has_rss;
+   bool has_rss_hash_report;
u8 rss_key_size;
u16 rss_indir_table_size;
u32 rss_hash_types_supported;
@@ -1148,6 +1149,35 @@ static struct sk_buff *receive_mergeable(struct 
net_device *dev,
return NULL;
 }
 
+static void virtio_skb_set_hash(const struct virtio_net_hdr_v1_hash *hdr_hash,
+   struct sk_buff *skb)
+{
+   enum pkt_hash_types rss_hash_type;
+
+   if (!hdr_hash || !skb)
+   return;
+
+   switch (hdr_hash->hash_report) {
+   case VIRTIO_NET_HASH_REPORT_TCPv4:
+   case VIRTIO_NET_HASH_REPORT_UDPv4:
+   case VIRTIO_NET_HASH_REPORT_TCPv6:
+   case VIRTIO_NET_HASH_REPORT_UDPv6:
+   case VIRTIO_NET_HASH_REPORT_TCPv6_EX:
+   case VIRTIO_NET_HASH_REPORT_UDPv6_EX:
+   rss_hash_type = PKT_HASH_TYPE_L4;
+   break;
+   case VIRTIO_NET_HASH_REPORT_IPv4:
+   case VIRTIO_NET_HASH_REPORT_IPv6:
+   case VIRTIO_NET_HASH_REPORT_IPv6_EX:
+   rss_hash_type = PKT_HASH_TYPE_L3;
+   break;
+   case VIRTIO_NET_HASH_REPORT_NONE:
+   default:
+   rss_hash_type = PKT_HASH_TYPE_NONE;
+   }
+   skb_set_hash(skb, hdr_hash->hash_value, rss_hash_type);
+}
+
 static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
void *buf, unsigned int len, void **ctx,
unsigned int *xdp_xmit,
@@ -1182,6 +1212,8 @@ static void receive_buf(struct virtnet_info *vi, struct 
receive_queue *rq,
return;
 
hdr = skb_vnet_hdr(skb);
+   if (dev->features & NETIF_F_RXHASH && vi->has_rss_hash_report)
+   virtio_skb_set_hash((const struct virtio_net_hdr_v1_hash *)hdr, 
skb);
 
if (hdr->hdr.flags & VIRTIO_NET_HDR_F_DATA_VALID)
skb->ip_summed = CHECKSUM_UNNECESSARY;
@@ -2232,7 +2264,8 @@ static bool virtnet_commit_rss_command(struct 
virtnet_info *vi)
sg_set_buf(&sgs[3], vi->ctrl->rss.key, sg_buf_size);
 
if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_MQ,
- VIRTIO_NET_CTRL_MQ_RSS_CONFIG, sgs)) {
+ vi->has_rss ? VIRTIO_NET_CTRL_MQ_RSS_CONFIG
+ : VIRTIO_NET_CTRL_MQ_HASH_CONFIG, sgs)) {
dev_warn(&dev->dev, "VIRTIONET issue with committing RSS 
sgs\n");
return false;
}
@@ -3231,6 +3264,8 @@ static bool virtnet_validate_features(struct 
virtio_device *vdev)
 VIRTNET_FAIL_ON(vdev, VIRTIO_NET_F_CTRL_MAC_ADDR,
 "VIRTIO_NET_F_CTRL_VQ") ||
 VIRTNET_FAIL_ON(vdev, VIRTIO_NET_F_RSS,
+"VIRTIO_NET_F_CTRL_VQ") ||
+VIRTNET_FAIL_ON(vdev, VIRTIO_NET_F_HASH_REPORT,
 "VIRTIO_NET_F_CTRL_VQ"))) {
return false;
}
@@ -3366,8 +3401,13 @@ static int virtnet_probe(struct virtio_device *vdev)
if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
vi->mergeable_rx_bufs = true;
 
-   if (virtio_has_feature(vdev, VIRTIO_NET_F_RSS)) {
+   if (virtio_has_feature(vdev, VIRTIO_NET_F_HASH_REPORT))
+   vi->has_rss_hash_report = true;
+
+   if (virtio_has_feature(vdev, VIRTIO_NET_F_RSS))
vi->has_rss = true;
+
+   if (vi->has_rss || vi->has_rss_hash_report) {
vi->rss_indir_table_size =
virtio_cread16(vdev, offsetof(struct virtio_net_config,
rss_max_indirection_table_length));
@@ -3383,8 +3423,11 @@ static int virtnet_probe(struct virtio_device *vdev)
 
dev->hw_features |= NETIF_F_RXHASH;
}
-   if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF) ||
-   virtio_has_feature(vdev, VIRTIO_F_VERSION_1))
+
+   if (vi->has_rss_hash_report)
+   vi->hdr_len = sizeof(struct virtio_net_hdr_v1_hash);
+   else if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF) ||
+virtio_has_feature(vdev, VIRTIO_F_VERSION_1))
vi->hdr_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
else
vi->hdr_len = sizeof(struct virtio_net_hdr);
@@ -3451,7 +3494,7 @@ static int virtnet_probe(struct virtio_device *vdev)
}
}
 
-   if (vi->has_rss)
+   if (vi->has_rss || vi->has_rss_hash_

Re: [RFC PATCH 0/5] TUN/VirtioNet USO features support.

2022-02-22 Thread Andrew Melnichenko
Hi all,

On Wed, Feb 9, 2022 at 7:41 AM Jason Wang  wrote:
>
>
> 在 2022/2/8 下午9:09, Andrew Melnichenko 写道:
> > Hi people,
> > Can you please review this series?
>
>
> Are there any performance number to demonstrate the difference?
>
> Thanks
>

Yeah, I've used udpgso_bench from Linux to test.
Here are some numbers:

Sending packets with size 1

Without USO:
```
$ ./udpgso_bench_tx -4 -D 192.168.15.1 -s 1 -S 1000
random: crng init done
random: 7 urandom warning(s) missed due to ratelimiting
udp tx: 36 MB/s 3863 calls/s   3863 msg/s
udp tx: 32 MB/s 3360 calls/s   3360 msg/s
udp tx: 31 MB/s 3340 calls/s   3340 msg/s
udp tx: 31 MB/s 3353 calls/s   3353 msg/s
udp tx: 32 MB/s 3359 calls/s   3359 msg/s
udp tx: 32 MB/s 3370 calls/s   3370 msg/s
```

With USO:
```
$ ./udpgso_bench_tx -4 -D 192.168.15.1 -s 1 -S 1000
random: crng init done
random: 7 urandom warning(s) missed due to ratelimiting
udp tx:120 MB/s12596 calls/s  12596 msg/s
udp tx:122 MB/s12885 calls/s  12885 msg/s
udp tx:120 MB/s12667 calls/s  12667 msg/s
udp tx:123 MB/s12969 calls/s  12969 msg/s
udp tx:116 MB/s12232 calls/s  12232 msg/s
udp tx:108 MB/s11389 calls/s  11389 msg/s
```


>
> >
> > On Wed, Jan 26, 2022 at 10:32 AM Yuri Benditovich
> >  wrote:
> >> On Wed, Jan 26, 2022 at 9:54 AM Xuan Zhuo  
> >> wrote:
> >>> On Tue, 25 Jan 2022 10:46:57 +0200, Andrew Melnychenko 
> >>>  wrote:
>  Added new offloads for TUN devices TUN_F_USO4 and TUN_F_USO6.
>  Technically they enable NETIF_F_GSO_UDP_L4
>  (and only if USO4 & USO6 are set simultaneously).
>  It allows to transmission of large UDP packets.
> 
>  Different features USO4 and USO6 are required for qemu where Windows 
>  guests can
>  enable disable USO receives for IPv4 and IPv6 separately.
>  On the other side, Linux can't really differentiate USO4 and USO6, for 
>  now.
>  For now, to enable USO for TUN it requires enabling USO4 and USO6 
>  together.
>  In the future, there would be a mechanism to control UDP_L4 GSO 
>  separately.
> 
>  Test it WIP Qemu https://github.com/daynix/qemu/tree/Dev_USOv2
> 
>  New types for VirtioNet already on mailing:
>  https://lists.oasis-open.org/archives/virtio-comment/202110/msg00010.html
> >>> Seems like this hasn't been upvoted yet.
> >>>
> >>>  https://github.com/oasis-tcs/virtio-spec#use-of-github-issues
> >> Yes, correct. This is a reason why this series of patches is RFC.
> >>
> >>> Thanks.
> >>>
>  Also, there is a known issue with transmitting packages between two 
>  guests.
>  Without hacks with skb's GSO - packages are still segmented on the 
>  host's postrouting.
> 
>  Andrew Melnychenko (5):
> uapi/linux/if_tun.h: Added new ioctl for tun/tap.
> driver/net/tun: Added features for USO.
> uapi/linux/virtio_net.h: Added USO types.
> linux/virtio_net.h: Added Support for GSO_UDP_L4 offload.
> drivers/net/virtio_net.c: Added USO support.
> 
>    drivers/net/tap.c   | 18 --
>    drivers/net/tun.c   | 15 ++-
>    drivers/net/virtio_net.c| 22 ++
>    include/linux/virtio_net.h  | 11 +++
>    include/uapi/linux/if_tun.h |  3 +++
>    include/uapi/linux/virtio_net.h |  4 
>    6 files changed, 66 insertions(+), 7 deletions(-)
> 
>  --
>  2.34.1
> 
>  ___
>  Virtualization mailing list
>  Virtualization@lists.linux-foundation.org
>  https://lists.linuxfoundation.org/mailman/listinfo/virtualization
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH 3/5] uapi/linux/virtio_net.h: Added USO types.

2022-02-22 Thread Andrew Melnichenko
Hi all,



On Wed, Feb 9, 2022 at 6:41 AM Jason Wang  wrote:
>
>
> 在 2022/1/25 下午4:47, Andrew Melnychenko 写道:
> > Added new GSO type for USO: VIRTIO_NET_HDR_GSO_UDP_L4.
> > Feature VIRTIO_NET_F_HOST_USO allows to enable NETIF_F_GSO_UDP_L4.
> > Separated VIRTIO_NET_F_GUEST_USO4 & VIRTIO_NET_F_GUEST_USO6 features
> > required for Windows guests.
> >
> > Signed-off-by: Andrew Melnychenko 
> > ---
> >   include/uapi/linux/virtio_net.h | 4 
> >   1 file changed, 4 insertions(+)
> >
> > diff --git a/include/uapi/linux/virtio_net.h 
> > b/include/uapi/linux/virtio_net.h
> > index 3f55a4215f11..620addc5767b 100644
> > --- a/include/uapi/linux/virtio_net.h
> > +++ b/include/uapi/linux/virtio_net.h
> > @@ -56,6 +56,9 @@
> >   #define VIRTIO_NET_F_MQ 22  /* Device supports Receive Flow
> >* Steering */
> >   #define VIRTIO_NET_F_CTRL_MAC_ADDR 23   /* Set MAC address */
> > +#define VIRTIO_NET_F_GUEST_USO4  54  /* Guest can handle USOv4 in. 
> > */
> > +#define VIRTIO_NET_F_GUEST_USO6  55  /* Guest can handle USOv6 in. 
> > */
> > +#define VIRTIO_NET_F_HOST_USO56  /* Host can handle USO in. */
>
>
> I think it's better to be consistent here. Either we split in both guest
> and host or not.
>
> Thanks
>

The main reason that receives USO packets depends on the kernel, where
transmitting the feature that VirtIO implements.
Windows systems have the option to manipulate receive offload. That's
why there are two GUEST_USO features.
For HOST_USO - technically there is no point in "split" it, and there
is should not be any difference between IPv4/IPv6.
Technically, we either support transmitting big UDP packets or not.

>
> >
> >   #define VIRTIO_NET_F_HASH_REPORT  57/* Supports hash report */
> >   #define VIRTIO_NET_F_RSS  60/* Supports RSS RX steering */
> > @@ -130,6 +133,7 @@ struct virtio_net_hdr_v1 {
> >   #define VIRTIO_NET_HDR_GSO_TCPV41   /* GSO frame, IPv4 TCP (TSO) 
> > */
> >   #define VIRTIO_NET_HDR_GSO_UDP  3   /* GSO frame, IPv4 
> > UDP (UFO) */
> >   #define VIRTIO_NET_HDR_GSO_TCPV64   /* GSO frame, IPv6 TCP */
> > +#define VIRTIO_NET_HDR_GSO_UDP_L45   /* GSO frame, IPv4 & IPv6 UDP 
> > (USO) */
> >   #define VIRTIO_NET_HDR_GSO_ECN  0x80/* TCP has ECN set */
> >   __u8 gso_type;
> >   __virtio16 hdr_len; /* Ethernet + IP + tcp/udp hdrs */
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH 2/5] driver/net/tun: Added features for USO.

2022-02-22 Thread Andrew Melnichenko
On Wed, Feb 9, 2022 at 6:39 AM Jason Wang  wrote:
>
>
> 在 2022/1/25 下午4:46, Andrew Melnychenko 写道:
> > Added support for USO4 and USO6, also added code for new ioctl 
> > TUNGETSUPPORTEDOFFLOADS.
> > For now, to "enable" USO, it's required to set both USO4 and USO6 
> > simultaneously.
> > USO enables NETIF_F_GSO_UDP_L4.
> >
> > Signed-off-by: Andrew Melnychenko 
> > ---
> >   drivers/net/tap.c | 18 --
> >   drivers/net/tun.c | 15 ++-
> >   2 files changed, 30 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/net/tap.c b/drivers/net/tap.c
> > index 8e3a28ba6b28..82d742ba78b1 100644
> > --- a/drivers/net/tap.c
> > +++ b/drivers/net/tap.c
> > @@ -940,6 +940,10 @@ static int set_offload(struct tap_queue *q, unsigned 
> > long arg)
> >   if (arg & TUN_F_TSO6)
> >   feature_mask |= NETIF_F_TSO6;
> >   }
> > +
> > + /* TODO: for now USO4 and USO6 should work simultaneously */
> > + if (arg & (TUN_F_USO4 | TUN_F_USO6) == (TUN_F_USO4 | 
> > TUN_F_USO6))
> > + features |= NETIF_F_GSO_UDP_L4;
>
>
> If kernel doesn't want to split the GSO_UDP features, I wonder how much
> value to keep separated features for TUN and virtio.
>
> Thanks
>

It's important for Windows guests that may request USO receive only
for IPv4 or IPv6.
Or there is possible to implement one feature and change its
"meanings" when "split" happens.
I think it's a good idea to implement an interface for iUSO4/USO6 and
do it right away.

>
> >   }
> >
> >   /* tun/tap driver inverts the usage for TSO offloads, where
> > @@ -950,7 +954,8 @@ static int set_offload(struct tap_queue *q, unsigned 
> > long arg)
> >* When user space turns off TSO, we turn off GSO/LRO so that
> >* user-space will not receive TSO frames.
> >*/
> > - if (feature_mask & (NETIF_F_TSO | NETIF_F_TSO6))
> > + if (feature_mask & (NETIF_F_TSO | NETIF_F_TSO6) ||
> > + feature_mask & (TUN_F_USO4 | TUN_F_USO6) == (TUN_F_USO4 | 
> > TUN_F_USO6))
> >   features |= RX_OFFLOADS;
> >   else
> >   features &= ~RX_OFFLOADS;
> > @@ -979,6 +984,7 @@ static long tap_ioctl(struct file *file, unsigned int 
> > cmd,
> >   unsigned short u;
> >   int __user *sp = argp;
> >   struct sockaddr sa;
> > + unsigned int supported_offloads;
> >   int s;
> >   int ret;
> >
> > @@ -1074,7 +1080,8 @@ static long tap_ioctl(struct file *file, unsigned int 
> > cmd,
> >   case TUNSETOFFLOAD:
> >   /* let the user check for future flags */
> >   if (arg & ~(TUN_F_CSUM | TUN_F_TSO4 | TUN_F_TSO6 |
> > - TUN_F_TSO_ECN | TUN_F_UFO))
> > + TUN_F_TSO_ECN | TUN_F_UFO |
> > + TUN_F_USO4 | TUN_F_USO6))
> >   return -EINVAL;
> >
> >   rtnl_lock();
> > @@ -1082,6 +1089,13 @@ static long tap_ioctl(struct file *file, unsigned 
> > int cmd,
> >   rtnl_unlock();
> >   return ret;
> >
> > + case TUNGETSUPPORTEDOFFLOADS:
> > + supported_offloads = TUN_F_CSUM | TUN_F_TSO4 | TUN_F_TSO6 |
> > + TUN_F_TSO_ECN | TUN_F_UFO | 
> > TUN_F_USO4 | TUN_F_USO6;
> > + if (copy_to_user(&arg, &supported_offloads, 
> > sizeof(supported_offloads)))
> > + return -EFAULT;
> > + return 0;
> > +
> >   case SIOCGIFHWADDR:
> >   rtnl_lock();
> >   tap = tap_get_tap_dev(q);
> > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > index fed85447701a..4f2105d1e6f1 100644
> > --- a/drivers/net/tun.c
> > +++ b/drivers/net/tun.c
> > @@ -185,7 +185,7 @@ struct tun_struct {
> >   struct net_device   *dev;
> >   netdev_features_t   set_features;
> >   #define TUN_USER_FEATURES (NETIF_F_HW_CSUM|NETIF_F_TSO_ECN|NETIF_F_TSO| \
> > -   NETIF_F_TSO6)
> > +   NETIF_F_TSO6 | NETIF_F_GSO_UDP_L4)
> >
> >   int align;
> >   int vnet_hdr_sz;
> > @@ -2821,6 +2821,12 @@ static int set_offload(struct tun_struct *tun, 
> > unsigned long arg)
> >   }
> >
> >   arg &= ~TUN_F_UFO;
> > +
> > + /* TODO: for now USO4 and USO6 should work simultaneously */
> > + if (arg & TUN_F_USO4 && arg & TUN_F_USO6) {
> > + features |= NETIF_F_GSO_UDP_L4;
> > + arg &= ~(TUN_F_USO4 | TUN_F_USO6);
> > + }
> >   }
> >
> >   /* This gives the user a way to test for new features in future by
> > @@ -2991,6 +2997,7 @@ static long __tun_chr_ioctl(struct file *file, 
> > unsigned int cmd,
> >   int sndbuf;
> >   int vnet_hdr_sz;
> >   int le;
> > + unsigned int supported_offloads;
> >   int ret;
> >   bool do_notify = false;
> >
> > @@ -3154,6 +3161,12 @@ 

Re: [RFC PATCH 1/5] uapi/linux/if_tun.h: Added new ioctl for tun/tap.

2022-02-22 Thread Andrew Melnichenko
Hi all,

On Wed, Feb 9, 2022 at 6:26 AM Jason Wang  wrote:
>
>
> 在 2022/1/25 下午4:46, Andrew Melnychenko 写道:
> > Added TUNGETSUPPORTEDOFFLOADS that should allow
> > to get bits of supported offloads.
>
>
> So we don't use dedicated ioctls in the past, instead, we just probing
> by checking the return value of TUNSETOFFLOADS.
>
> E.g qemu has the following codes:
>
> int tap_probe_has_ufo(int fd)
> {
>  unsigned offload;
>
>  offload = TUN_F_CSUM | TUN_F_UFO;
>
>  if (ioctl(fd, TUNSETOFFLOAD, offload) < 0)
>  return 0;
>
>  return 1;
> }
>
> Any reason we can't keep using that?
>
> Thanks
>

Well, even in this example. To check the ufo feature, we trying to set it.
What if we don't need to "enable" UFO and/or do not change its state?
I think it's a good idea to have the ability to get supported offloads
without changing device behavior.

>
> > Added 2 additional offlloads for USO(IPv4 & IPv6).
> > Separate offloads are required for Windows VM guests,
> > g.e. Windows may set USO rx only for IPv4.
> >
> > Signed-off-by: Andrew Melnychenko 
> > ---
> >   include/uapi/linux/if_tun.h | 3 +++
> >   1 file changed, 3 insertions(+)
> >
> > diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h
> > index 454ae31b93c7..07680fae6e18 100644
> > --- a/include/uapi/linux/if_tun.h
> > +++ b/include/uapi/linux/if_tun.h
> > @@ -61,6 +61,7 @@
> >   #define TUNSETFILTEREBPF _IOR('T', 225, int)
> >   #define TUNSETCARRIER _IOW('T', 226, int)
> >   #define TUNGETDEVNETNS _IO('T', 227)
> > +#define TUNGETSUPPORTEDOFFLOADS _IOR('T', 228, unsigned int)
> >
> >   /* TUNSETIFF ifr flags */
> >   #define IFF_TUN 0x0001
> > @@ -88,6 +89,8 @@
> >   #define TUN_F_TSO6  0x04/* I can handle TSO for IPv6 packets */
> >   #define TUN_F_TSO_ECN   0x08/* I can handle TSO with ECN bits. */
> >   #define TUN_F_UFO   0x10/* I can handle UFO packets */
> > +#define TUN_F_USO4   0x20/* I can handle USO for IPv4 packets */
> > +#define TUN_F_USO6   0x40/* I can handle USO for IPv6 packets */
> >
> >   /* Protocol info prepended to the packets (when IFF_NO_PI is not set) */
> >   #define TUN_PKT_STRIP   0x0001
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] vhost: validate range size before adding to iotlb

2022-02-22 Thread Michael S. Tsirkin
On Tue, Feb 22, 2022 at 01:23:03AM +0530, Anirudh Rayabharam wrote:
> In vhost_iotlb_add_range_ctx(), validate the range size is non-zero
> before proceeding with adding it to the iotlb.
> 
> Range size can overflow to 0 when start is 0 and last is (2^64 - 1).
> One instance where it can happen is when userspace sends an IOTLB
> message with iova=size=uaddr=0 (vhost_process_iotlb_msg). So, an
> entry with size = 0, start = 0, last = (2^64 - 1) ends up in the
> iotlb. Next time a packet is sent, iotlb_access_ok() loops
> indefinitely due to that erroneous entry:
> 
>   Call Trace:
>
>iotlb_access_ok+0x21b/0x3e0 drivers/vhost/vhost.c:1340
>vq_meta_prefetch+0xbc/0x280 drivers/vhost/vhost.c:1366
>vhost_transport_do_send_pkt+0xe0/0xfd0 drivers/vhost/vsock.c:104
>vhost_worker+0x23d/0x3d0 drivers/vhost/vhost.c:372
>kthread+0x2e9/0x3a0 kernel/kthread.c:377
>ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
>
> 
> Reported by syzbot at:
>   https://syzkaller.appspot.com/bug?extid=0abd373e2e50d704db87
> 
> Reported-by: syzbot+0abd373e2e50d704d...@syzkaller.appspotmail.com
> Tested-by: syzbot+0abd373e2e50d704d...@syzkaller.appspotmail.com
> Signed-off-by: Anirudh Rayabharam 
> ---
>  drivers/vhost/iotlb.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vhost/iotlb.c b/drivers/vhost/iotlb.c
> index 670d56c879e5..b9de74bd2f9c 100644
> --- a/drivers/vhost/iotlb.c
> +++ b/drivers/vhost/iotlb.c
> @@ -53,8 +53,10 @@ int vhost_iotlb_add_range_ctx(struct vhost_iotlb *iotlb,
> void *opaque)
>  {
>   struct vhost_iotlb_map *map;
> + u64 size = last - start + 1;
>  
> - if (last < start)
> + // size can overflow to 0 when start is 0 and last is (2^64 - 1).

Pls use the old-style /* */  comments.

> + if (last < start || size == 0)
>   return -EFAULT;
>  
>   if (iotlb->limit &&
> @@ -69,7 +71,7 @@ int vhost_iotlb_add_range_ctx(struct vhost_iotlb *iotlb,
>   return -ENOMEM;
>  
>   map->start = start;
> - map->size = last - start + 1;
> + map->size = size;
>   map->last = last;
>   map->addr = addr;
>   map->perm = perm;
> -- 
> 2.35.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vhost: validate range size before adding to iotlb

2022-02-22 Thread Michael S. Tsirkin
On Tue, Feb 22, 2022 at 03:11:07PM +0800, Jason Wang wrote:
> On Tue, Feb 22, 2022 at 12:57 PM Anirudh Rayabharam  
> wrote:
> >
> > On Tue, Feb 22, 2022 at 10:50:20AM +0800, Jason Wang wrote:
> > > On Tue, Feb 22, 2022 at 3:53 AM Anirudh Rayabharam  
> > > wrote:
> > > >
> > > > In vhost_iotlb_add_range_ctx(), validate the range size is non-zero
> > > > before proceeding with adding it to the iotlb.
> > > >
> > > > Range size can overflow to 0 when start is 0 and last is (2^64 - 1).
> > > > One instance where it can happen is when userspace sends an IOTLB
> > > > message with iova=size=uaddr=0 (vhost_process_iotlb_msg). So, an
> > > > entry with size = 0, start = 0, last = (2^64 - 1) ends up in the
> > > > iotlb. Next time a packet is sent, iotlb_access_ok() loops
> > > > indefinitely due to that erroneous entry:
> > > >
> > > > Call Trace:
> > > >  
> > > >  iotlb_access_ok+0x21b/0x3e0 drivers/vhost/vhost.c:1340
> > > >  vq_meta_prefetch+0xbc/0x280 drivers/vhost/vhost.c:1366
> > > >  vhost_transport_do_send_pkt+0xe0/0xfd0 
> > > > drivers/vhost/vsock.c:104
> > > >  vhost_worker+0x23d/0x3d0 drivers/vhost/vhost.c:372
> > > >  kthread+0x2e9/0x3a0 kernel/kthread.c:377
> > > >  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> > > >  
> > > >
> > > > Reported by syzbot at:
> > > > https://syzkaller.appspot.com/bug?extid=0abd373e2e50d704db87
> > > >
> > > > Reported-by: syzbot+0abd373e2e50d704d...@syzkaller.appspotmail.com
> > > > Tested-by: syzbot+0abd373e2e50d704d...@syzkaller.appspotmail.com
> > > > Signed-off-by: Anirudh Rayabharam 
> > > > ---
> > > >  drivers/vhost/iotlb.c | 6 --
> > > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/vhost/iotlb.c b/drivers/vhost/iotlb.c
> > > > index 670d56c879e5..b9de74bd2f9c 100644
> > > > --- a/drivers/vhost/iotlb.c
> > > > +++ b/drivers/vhost/iotlb.c
> > > > @@ -53,8 +53,10 @@ int vhost_iotlb_add_range_ctx(struct vhost_iotlb 
> > > > *iotlb,
> > > >   void *opaque)
> > > >  {
> > > > struct vhost_iotlb_map *map;
> > > > +   u64 size = last - start + 1;
> > > >
> > > > -   if (last < start)
> > > > +   // size can overflow to 0 when start is 0 and last is (2^64 - 
> > > > 1).
> > > > +   if (last < start || size == 0)
> > > > return -EFAULT;
> > >
> > > I'd move this check to vhost_chr_iter_write(), then for the device who
> > > has its own msg handler (e.g vDPA) can benefit from it as well.
> >
> > Thanks for reviewing!
> >
> > I kept the check here thinking that all devices would benefit from it
> > because they would need to call vhost_iotlb_add_range() to add an entry
> > to the iotlb. Isn't that correct?
> 
> Correct for now but not for the future, it's not guaranteed that the
> per device iotlb message handler will use vhost iotlb.
> 
> But I agree that we probably don't need to care about it too much now.
> 
> > Do you see any other benefit in moving
> > it to vhost_chr_iter_write()?
> >
> > One concern I have is that if we move it out some future caller to
> > vhost_iotlb_add_range() might forget to handle this case.
> 
> Yes.
> 
> Rethink the whole fix, we're basically rejecting [0, ULONG_MAX] range
> which seems a little bit odd.

Well, I guess ideally we'd split this up as two entries - this kind of
thing is after all one of the reasons we initially used first,last as
the API - as opposed to first,size.

Anirudh, could you do it like this instead of rejecting?


> I wonder if it's better to just remove
> the map->size. Having a quick glance at the the user, I don't see any
> blocker for this.
> 
> Thanks

I think it's possible but won't solve the bug by itself, and we'd need
to review and fix all users - a high chance of introducing
another regression. And I think there's value of fitting under the
stable rule of 100 lines with context.
So sure, but let's fix the bug first.



> >
> > Thanks!
> >
> > - Anirudh.
> >
> > >
> > > Thanks
> > >
> > > >
> > > > if (iotlb->limit &&
> > > > @@ -69,7 +71,7 @@ int vhost_iotlb_add_range_ctx(struct vhost_iotlb 
> > > > *iotlb,
> > > > return -ENOMEM;
> > > >
> > > > map->start = start;
> > > > -   map->size = last - start + 1;
> > > > +   map->size = size;
> > > > map->last = last;
> > > > map->addr = addr;
> > > > map->perm = perm;
> > > > --
> > > > 2.35.1
> > > >
> > >
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] vhost: use bvec_kmap_local in {get,put}u16_iotlb

2022-02-22 Thread Christoph Hellwig
Using local kmaps slightly reduces the chances to stray writes, and
the bvec interface cleans up the code a little bit.

Signed-off-by: Christoph Hellwig 
---
 drivers/vhost/vringh.c | 18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
index 14e2043d76852..0f22a83fd09af 100644
--- a/drivers/vhost/vringh.c
+++ b/drivers/vhost/vringh.c
@@ -1173,7 +1173,7 @@ static inline int getu16_iotlb(const struct vringh *vrh,
   u16 *val, const __virtio16 *p)
 {
struct bio_vec iov;
-   void *kaddr, *from;
+   void *kaddr;
int ret;
 
/* Atomic read is needed for getu16 */
@@ -1182,10 +1182,9 @@ static inline int getu16_iotlb(const struct vringh *vrh,
if (ret < 0)
return ret;
 
-   kaddr = kmap_atomic(iov.bv_page);
-   from = kaddr + iov.bv_offset;
-   *val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)from));
-   kunmap_atomic(kaddr);
+   kaddr = bvec_kmap_local(&iov);
+   *val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)kaddr));
+   kunmap_local(kaddr);
 
return 0;
 }
@@ -1194,7 +1193,7 @@ static inline int putu16_iotlb(const struct vringh *vrh,
   __virtio16 *p, u16 val)
 {
struct bio_vec iov;
-   void *kaddr, *to;
+   void *kaddr;
int ret;
 
/* Atomic write is needed for putu16 */
@@ -1203,10 +1202,9 @@ static inline int putu16_iotlb(const struct vringh *vrh,
if (ret < 0)
return ret;
 
-   kaddr = kmap_atomic(iov.bv_page);
-   to = kaddr + iov.bv_offset;
-   WRITE_ONCE(*(__virtio16 *)to, cpu_to_vringh16(vrh, val));
-   kunmap_atomic(kaddr);
+   kaddr = bvec_kmap_local(&iov);
+   WRITE_ONCE(*(__virtio16 *)kaddr, cpu_to_vringh16(vrh, val));
+   kunmap_local(kaddr);
 
return 0;
 }
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 3/4] drivers/net/virtio_net: Added RSS hash report.

2022-02-22 Thread kernel test robot
Hi Andrew,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on mst-vhost/linux-next]
[also build test WARNING on net/master horms-ipvs/master net-next/master 
linus/master v5.17-rc5 next-20220217]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Andrew-Melnychenko/RSS-support-for-VirtioNet/20220222-200334
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
config: i386-randconfig-s002-20220221 
(https://download.01.org/0day-ci/archive/20220223/202202230342.hpye6dha-...@intel.com/config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce:
# apt-get install sparse
# sparse version: v0.6.4-dirty
# 
https://github.com/0day-ci/linux/commit/4fda71c17afd24d8afb675baa0bb14dbbc6cd23c
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Andrew-Melnychenko/RSS-support-for-VirtioNet/20220222-200334
git checkout 4fda71c17afd24d8afb675baa0bb14dbbc6cd23c
# save the config file to linux build tree
mkdir build_dir
make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir 
ARCH=i386 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 


sparse warnings: (new ones prefixed by >>)
   drivers/net/virtio_net.c:1160:25: sparse: sparse: restricted __le16 degrades 
to integer
   drivers/net/virtio_net.c:1160:25: sparse: sparse: restricted __le16 degrades 
to integer
   drivers/net/virtio_net.c:1160:25: sparse: sparse: restricted __le16 degrades 
to integer
   drivers/net/virtio_net.c:1160:25: sparse: sparse: restricted __le16 degrades 
to integer
   drivers/net/virtio_net.c:1160:25: sparse: sparse: restricted __le16 degrades 
to integer
   drivers/net/virtio_net.c:1160:25: sparse: sparse: restricted __le16 degrades 
to integer
   drivers/net/virtio_net.c:1160:25: sparse: sparse: restricted __le16 degrades 
to integer
   drivers/net/virtio_net.c:1160:25: sparse: sparse: restricted __le16 degrades 
to integer
   drivers/net/virtio_net.c:1160:25: sparse: sparse: restricted __le16 degrades 
to integer
>> drivers/net/virtio_net.c:1178:35: sparse: sparse: incorrect type in argument 
>> 2 (different base types) @@ expected unsigned int [usertype] hash @@ 
>> got restricted __le32 const [usertype] hash_value @@
   drivers/net/virtio_net.c:1178:35: sparse: expected unsigned int 
[usertype] hash
   drivers/net/virtio_net.c:1178:35: sparse: got restricted __le32 const 
[usertype] hash_value

vim +1178 drivers/net/virtio_net.c

  1151  
  1152  static void virtio_skb_set_hash(const struct virtio_net_hdr_v1_hash 
*hdr_hash,
  1153  struct sk_buff *skb)
  1154  {
  1155  enum pkt_hash_types rss_hash_type;
  1156  
  1157  if (!hdr_hash || !skb)
  1158  return;
  1159  
  1160  switch (hdr_hash->hash_report) {
  1161  case VIRTIO_NET_HASH_REPORT_TCPv4:
  1162  case VIRTIO_NET_HASH_REPORT_UDPv4:
  1163  case VIRTIO_NET_HASH_REPORT_TCPv6:
  1164  case VIRTIO_NET_HASH_REPORT_UDPv6:
  1165  case VIRTIO_NET_HASH_REPORT_TCPv6_EX:
  1166  case VIRTIO_NET_HASH_REPORT_UDPv6_EX:
  1167  rss_hash_type = PKT_HASH_TYPE_L4;
  1168  break;
  1169  case VIRTIO_NET_HASH_REPORT_IPv4:
  1170  case VIRTIO_NET_HASH_REPORT_IPv6:
  1171  case VIRTIO_NET_HASH_REPORT_IPv6_EX:
  1172  rss_hash_type = PKT_HASH_TYPE_L3;
  1173  break;
  1174  case VIRTIO_NET_HASH_REPORT_NONE:
  1175  default:
  1176  rss_hash_type = PKT_HASH_TYPE_NONE;
  1177  }
> 1178  skb_set_hash(skb, hdr_hash->hash_value, rss_hash_type);
  1179  }
  1180  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] drm/virtio: Remove restriction of non-zero blob_flags

2022-02-22 Thread Chia-I Wu
On Sat, Feb 19, 2022 at 9:02 AM Rob Clark  wrote:
>
> From: Rob Clark 
>
> With native userspace drivers in guest, a lot of GEM objects need to be
> neither shared nor mappable.  And in fact making everything mappable
> and/or sharable results in unreasonably high fd usage in host VMM.
>
> Signed-off-by: Rob Clark 
Reviewed-by: Chia-I Wu 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] drm/virtio: Add USE_INTERNAL blob flag

2022-02-22 Thread Chia-I Wu
On Fri, Feb 18, 2022 at 9:51 AM Rob Clark  wrote:
>
> On Fri, Feb 18, 2022 at 8:42 AM Chia-I Wu  wrote:
> >
> > On Fri, Feb 18, 2022 at 7:57 AM Rob Clark  wrote:
> > >
> > > From: Rob Clark 
> > >
> > > With native userspace drivers in guest, a lot of GEM objects need to be
> > > neither shared nor mappable.  And in fact making everything mappable
> > > and/or sharable results in unreasonably high fd usage in host VMM.
> > >
> > > Signed-off-by: Rob Clark 
> > > ---
> > > This is for a thing I'm working on, a new virtgpu context type that
> > > allows for running native userspace driver in the guest, with a
> > > thin shim in the host VMM.  In this case, the guest has a lot of
> > > GEM buffer objects which need to be neither shared nor mappable.
> > >
> > > Alternative idea is to just drop the restriction that blob_flags
> > > be non-zero.  I'm ok with either approach.
> > Dropping the restriction sounds better to me.
> >
> > What is the use case for such a resource?  Does the host need to know
> > such a resource exists?
>
> There are a bunch of use cases, some internal (like visibility stream
> buffers filled during binning pass and consumed during draw pass),
> some external (tiled and/or UBWC buffers are never accessed on the
> CPU).
For these use cases, it's true that userspace might want internal bos,
and serialize them as res_ids which the host maps to host gem_handles.
But userspace can also skip the internal bos and encode host
gem_handles directly.

But the kernel probably should not dictate what the userspace should
do by requiring non-zero blob flags.

>
> In theory, at least currently, drm/virtgpu does not need to know about
> them, but there are a lot of places in userspace that expect to have a
> gem handle.  Longer term, I think I want to extend virtgpu with
> MADVISE ioctl so we can track DONTNEED state in guest and only release
> buffers when host and/or guest is under memory pressure.  For that we
> will defn need guest side gem handles
MADVICE is a hint that userspace sets and is not based on memory
pressure.  It is the receivers of the hint who take actions when under
memory pressure.  I think it can be between the guest userspace and
the host?


>
> BR,
> -R
>
> > >
> > >  drivers/gpu/drm/virtio/virtgpu_ioctl.c | 7 ++-
> > >  include/uapi/drm/virtgpu_drm.h | 1 +
> > >  2 files changed, 7 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
> > > b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> > > index 69f1952f3144..92e1ba6b8078 100644
> > > --- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> > > +++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> > > @@ -36,7 +36,8 @@
> > >
> > >  #define VIRTGPU_BLOB_FLAG_USE_MASK (VIRTGPU_BLOB_FLAG_USE_MAPPABLE | \
> > > VIRTGPU_BLOB_FLAG_USE_SHAREABLE | \
> > > -   VIRTGPU_BLOB_FLAG_USE_CROSS_DEVICE)
> > > +   VIRTGPU_BLOB_FLAG_USE_CROSS_DEVICE | \
> > > +   VIRTGPU_BLOB_FLAG_USE_INTERNAL)
> > >
> > >  static int virtio_gpu_fence_event_create(struct drm_device *dev,
> > >  struct drm_file *file,
> > > @@ -662,6 +663,10 @@ static int verify_blob(struct virtio_gpu_device 
> > > *vgdev,
> > > params->size = rc_blob->size;
> > > params->blob = true;
> > > params->blob_flags = rc_blob->blob_flags;
> > > +
> > > +   /* USE_INTERNAL is local to guest kernel, don't past to host: */
> > > +   params->blob_flags &= ~VIRTGPU_BLOB_FLAG_USE_INTERNAL;
> > > +
> > > return 0;
> > >  }
> > >
> > > diff --git a/include/uapi/drm/virtgpu_drm.h 
> > > b/include/uapi/drm/virtgpu_drm.h
> > > index 0512fde5e697..62b7483e5c60 100644
> > > --- a/include/uapi/drm/virtgpu_drm.h
> > > +++ b/include/uapi/drm/virtgpu_drm.h
> > > @@ -163,6 +163,7 @@ struct drm_virtgpu_resource_create_blob {
> > >  #define VIRTGPU_BLOB_FLAG_USE_MAPPABLE 0x0001
> > >  #define VIRTGPU_BLOB_FLAG_USE_SHAREABLE0x0002
> > >  #define VIRTGPU_BLOB_FLAG_USE_CROSS_DEVICE 0x0004
> > > +#define VIRTGPU_BLOB_FLAG_USE_INTERNAL 0x0008   /* not-mappable, 
> > > not-shareable */
> > > /* zero is invalid blob_mem */
> > > __u32 blob_mem;
> > > __u32 blob_flags;
> > > --
> > > 2.34.1
> > >
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] tools/virtio: Test virtual address range detection

2022-02-22 Thread Matthew Wilcox
On Tue, Feb 22, 2022 at 07:58:33AM +, David Woodhouse wrote:
> On Tue, 2022-02-22 at 01:31 -0500, Michael S. Tsirkin wrote:
> > On Mon, Feb 21, 2022 at 05:18:48PM +, David Woodhouse wrote:
> > > 
> > > [dwoodhou@i7 virtio]$ sudo ~/virtio_test
> > > Detected virtual address range 0x1000-0x7000
> > > spurious wakeups: 0x0 started=0x10 completed=0x10
> > > 
> > > Although in some circumstances I also see a different build failure:
> > > 
> > > cc -g -O2 -Werror -Wno-maybe-uninitialized -Wall -I. -I../include/ -I 
> > > ../../usr/include/ -Wno-pointer-sign -fno-strict-overflow 
> > > -fno-strict-aliasing -fno-common -MMD -U_FORTIFY_SOURCE -include 
> > > ../../include/linux/kconfig.h   -c -o vringh_test.o vringh_test.c

Trying to test this myself ...

$ cd tools/virtio/
$ make
...
cc -lpthread  virtio_test.o virtio_ring.o   -o virtio_test
/usr/bin/ld: virtio_ring.o: in function `spin_lock':
/home/willy/kernel/folio/tools/virtio/./linux/spinlock.h:16: undefined 
reference to `pthread_spin_lock'

So this is not the only problem here?

> > > In file included from ./linux/uio.h:3,
> > >  from ./linux/../../../include/linux/vringh.h:15,
> > >  from ./linux/vringh.h:1,
> > >  from vringh_test.c:9:
> > > ./linux/../../../include/linux/uio.h:10:10: fatal error: 
> > > linux/mm_types.h: No such file or directory
> > >10 | #include 
> > >   |  ^~
> > > compilation terminated.
> > > make: *** [: vringh_test.o] Error 1
> > 
> > Which tree has this build failure? In mine linux/uio.h does not
> > include linux/mm_types.h.
> 
> Strictly it's
> https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/xen-evtchn-kernel
> but I'm sure my part isn't relevant; it's just v5.17-rc5.
> 
>  $ git blame include/linux/uio.h | grep mm_types.h
> d9c19d32d86fa (Matthew Wilcox (Oracle) 2021-10-18 10:39:06 -0400  10) 
> #include 
>  $ git describe --tags d9c19d32d86fa
> v5.16-rc4-37-gd9c19d32d86f

grr.  Originally, I had this doing a typebusting cast, but hch objected,
so I had to include mm_types.h.  This should fix it ...

$ git diff
diff --git a/tools/virtio/linux/mm_types.h b/tools/virtio/linux/mm_types.h
new file mode 100644
index ..3b0fc9bc5b8f
--- /dev/null
+++ b/tools/virtio/linux/mm_types.h
@@ -0,0 +1,3 @@
+struct folio {
+   struct page page;
+};

At least, it makes it compile for me.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vhost: validate range size before adding to iotlb

2022-02-22 Thread Michael S. Tsirkin
On Tue, Feb 22, 2022 at 10:57:41PM +0530, Anirudh Rayabharam wrote:
> On Tue, Feb 22, 2022 at 10:02:29AM -0500, Michael S. Tsirkin wrote:
> > On Tue, Feb 22, 2022 at 03:11:07PM +0800, Jason Wang wrote:
> > > On Tue, Feb 22, 2022 at 12:57 PM Anirudh Rayabharam  
> > > wrote:
> > > >
> > > > On Tue, Feb 22, 2022 at 10:50:20AM +0800, Jason Wang wrote:
> > > > > On Tue, Feb 22, 2022 at 3:53 AM Anirudh Rayabharam 
> > > > >  wrote:
> > > > > >
> > > > > > In vhost_iotlb_add_range_ctx(), validate the range size is non-zero
> > > > > > before proceeding with adding it to the iotlb.
> > > > > >
> > > > > > Range size can overflow to 0 when start is 0 and last is (2^64 - 1).
> > > > > > One instance where it can happen is when userspace sends an IOTLB
> > > > > > message with iova=size=uaddr=0 (vhost_process_iotlb_msg). So, an
> > > > > > entry with size = 0, start = 0, last = (2^64 - 1) ends up in the
> > > > > > iotlb. Next time a packet is sent, iotlb_access_ok() loops
> > > > > > indefinitely due to that erroneous entry:
> > > > > >
> > > > > > Call Trace:
> > > > > >  
> > > > > >  iotlb_access_ok+0x21b/0x3e0 drivers/vhost/vhost.c:1340
> > > > > >  vq_meta_prefetch+0xbc/0x280 drivers/vhost/vhost.c:1366
> > > > > >  vhost_transport_do_send_pkt+0xe0/0xfd0 
> > > > > > drivers/vhost/vsock.c:104
> > > > > >  vhost_worker+0x23d/0x3d0 drivers/vhost/vhost.c:372
> > > > > >  kthread+0x2e9/0x3a0 kernel/kthread.c:377
> > > > > >  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> > > > > >  
> > > > > >
> > > > > > Reported by syzbot at:
> > > > > > https://syzkaller.appspot.com/bug?extid=0abd373e2e50d704db87
> > > > > >
> > > > > > Reported-by: syzbot+0abd373e2e50d704d...@syzkaller.appspotmail.com
> > > > > > Tested-by: syzbot+0abd373e2e50d704d...@syzkaller.appspotmail.com
> > > > > > Signed-off-by: Anirudh Rayabharam 
> > > > > > ---
> > > > > >  drivers/vhost/iotlb.c | 6 --
> > > > > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/vhost/iotlb.c b/drivers/vhost/iotlb.c
> > > > > > index 670d56c879e5..b9de74bd2f9c 100644
> > > > > > --- a/drivers/vhost/iotlb.c
> > > > > > +++ b/drivers/vhost/iotlb.c
> > > > > > @@ -53,8 +53,10 @@ int vhost_iotlb_add_range_ctx(struct vhost_iotlb 
> > > > > > *iotlb,
> > > > > >   void *opaque)
> > > > > >  {
> > > > > > struct vhost_iotlb_map *map;
> > > > > > +   u64 size = last - start + 1;
> > > > > >
> > > > > > -   if (last < start)
> > > > > > +   // size can overflow to 0 when start is 0 and last is (2^64 
> > > > > > - 1).
> > > > > > +   if (last < start || size == 0)
> > > > > > return -EFAULT;
> > > > >
> > > > > I'd move this check to vhost_chr_iter_write(), then for the device who
> > > > > has its own msg handler (e.g vDPA) can benefit from it as well.
> > > >
> > > > Thanks for reviewing!
> > > >
> > > > I kept the check here thinking that all devices would benefit from it
> > > > because they would need to call vhost_iotlb_add_range() to add an entry
> > > > to the iotlb. Isn't that correct?
> > > 
> > > Correct for now but not for the future, it's not guaranteed that the
> > > per device iotlb message handler will use vhost iotlb.
> > > 
> > > But I agree that we probably don't need to care about it too much now.
> > > 
> > > > Do you see any other benefit in moving
> > > > it to vhost_chr_iter_write()?
> > > >
> > > > One concern I have is that if we move it out some future caller to
> > > > vhost_iotlb_add_range() might forget to handle this case.
> > > 
> > > Yes.
> > > 
> > > Rethink the whole fix, we're basically rejecting [0, ULONG_MAX] range
> > > which seems a little bit odd.
> > 
> > Well, I guess ideally we'd split this up as two entries - this kind of
> > thing is after all one of the reasons we initially used first,last as
> > the API - as opposed to first,size.
> 
> IIUC, the APIs exposed to userspace accept first,size.

Some of them.


/* vhost vdpa IOVA range
 * @first: First address that can be mapped by vhost-vDPA
 * @last: Last address that can be mapped by vhost-vDPA
 */
struct vhost_vdpa_iova_range {
__u64 first;
__u64 last;
};

but

struct vhost_iotlb_msg {
__u64 iova;
__u64 size;
__u64 uaddr;
#define VHOST_ACCESS_RO  0x1
#define VHOST_ACCESS_WO  0x2
#define VHOST_ACCESS_RW  0x3
__u8 perm;
#define VHOST_IOTLB_MISS   1
#define VHOST_IOTLB_UPDATE 2
#define VHOST_IOTLB_INVALIDATE 3
#define VHOST_IOTLB_ACCESS_FAIL4
/*
 * VHOST_IOTLB_BATCH_BEGIN and VHOST_IOTLB_BATCH_END allow modifying
 * multiple mappings in one go: beginning with
 * VHOST_IOTLB_BATCH_BEGIN, followed by any number of
 * VHOST_IOTLB_UPDATE messages, and ending with VHOST_IOTLB_BATCH_END.
 * When one of these two values is used as the message type, the rest
 * of the fields in the message are ignored. Th

Re: [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding

2022-02-22 Thread Jason Wang
On Wed, Feb 23, 2022 at 3:01 AM Eugenio Perez Martin
 wrote:
>
> On Tue, Feb 8, 2022 at 9:11 AM Jason Wang  wrote:
> >
> >
> > 在 2022/2/2 上午1:08, Eugenio Perez Martin 写道:
> > > On Sun, Jan 30, 2022 at 5:43 AM Jason Wang  wrote:
> > >>
> > >> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > >>> Initial version of shadow virtqueue that actually forward buffers. There
> > >>> is no iommu support at the moment, and that will be addressed in future
> > >>> patches of this series. Since all vhost-vdpa devices use forced IOMMU,
> > >>> this means that SVQ is not usable at this point of the series on any
> > >>> device.
> > >>>
> > >>> For simplicity it only supports modern devices, that expects vring
> > >>> in little endian, with split ring and no event idx or indirect
> > >>> descriptors. Support for them will not be added in this series.
> > >>>
> > >>> It reuses the VirtQueue code for the device part. The driver part is
> > >>> based on Linux's virtio_ring driver, but with stripped functionality
> > >>> and optimizations so it's easier to review.
> > >>>
> > >>> However, forwarding buffers have some particular pieces: One of the most
> > >>> unexpected ones is that a guest's buffer can expand through more than
> > >>> one descriptor in SVQ. While this is handled gracefully by qemu's
> > >>> emulated virtio devices, it may cause unexpected SVQ queue full. This
> > >>> patch also solves it by checking for this condition at both guest's
> > >>> kicks and device's calls. The code may be more elegant in the future if
> > >>> SVQ code runs in its own iocontext.
> > >>>
> > >>> Signed-off-by: Eugenio Pérez 
> > >>> ---
> > >>>hw/virtio/vhost-shadow-virtqueue.h |   2 +
> > >>>hw/virtio/vhost-shadow-virtqueue.c | 365 
> > >>> -
> > >>>hw/virtio/vhost-vdpa.c | 111 -
> > >>>3 files changed, 462 insertions(+), 16 deletions(-)
> > >>>
> > >>> diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
> > >>> b/hw/virtio/vhost-shadow-virtqueue.h
> > >>> index 39aef5ffdf..19c934af49 100644
> > >>> --- a/hw/virtio/vhost-shadow-virtqueue.h
> > >>> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > >>> @@ -33,6 +33,8 @@ uint16_t vhost_svq_get_num(const VhostShadowVirtqueue 
> > >>> *svq);
> > >>>size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq);
> > >>>size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
> > >>>
> > >>> +void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
> > >>> + VirtQueue *vq);
> > >>>void vhost_svq_stop(VhostShadowVirtqueue *svq);
> > >>>
> > >>>VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
> > >>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
> > >>> b/hw/virtio/vhost-shadow-virtqueue.c
> > >>> index 7c168075d7..a1a404f68f 100644
> > >>> --- a/hw/virtio/vhost-shadow-virtqueue.c
> > >>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > >>> @@ -9,6 +9,8 @@
> > >>>
> > >>>#include "qemu/osdep.h"
> > >>>#include "hw/virtio/vhost-shadow-virtqueue.h"
> > >>> +#include "hw/virtio/vhost.h"
> > >>> +#include "hw/virtio/virtio-access.h"
> > >>>#include "standard-headers/linux/vhost_types.h"
> > >>>
> > >>>#include "qemu/error-report.h"
> > >>> @@ -36,6 +38,33 @@ typedef struct VhostShadowVirtqueue {
> > >>>
> > >>>/* Guest's call notifier, where SVQ calls guest. */
> > >>>EventNotifier svq_call;
> > >>> +
> > >>> +/* Virtio queue shadowing */
> > >>> +VirtQueue *vq;
> > >>> +
> > >>> +/* Virtio device */
> > >>> +VirtIODevice *vdev;
> > >>> +
> > >>> +/* Map for returning guest's descriptors */
> > >>> +VirtQueueElement **ring_id_maps;
> > >>> +
> > >>> +/* Next VirtQueue element that guest made available */
> > >>> +VirtQueueElement *next_guest_avail_elem;
> > >>> +
> > >>> +/* Next head to expose to device */
> > >>> +uint16_t avail_idx_shadow;
> > >>> +
> > >>> +/* Next free descriptor */
> > >>> +uint16_t free_head;
> > >>> +
> > >>> +/* Last seen used idx */
> > >>> +uint16_t shadow_used_idx;
> > >>> +
> > >>> +/* Next head to consume from device */
> > >>> +uint16_t last_used_idx;
> > >>> +
> > >>> +/* Cache for the exposed notification flag */
> > >>> +bool notification;
> > >>>} VhostShadowVirtqueue;
> > >>>
> > >>>#define INVALID_SVQ_KICK_FD -1
> > >>> @@ -148,30 +177,294 @@ bool vhost_svq_ack_guest_features(uint64_t 
> > >>> dev_features,
> > >>>return true;
> > >>>}
> > >>>
> > >>> -/* Forward guest notifications */
> > >>> -static void vhost_handle_guest_kick(EventNotifier *n)
> > >>> +/**
> > >>> + * Number of descriptors that SVQ can make available from the guest.
> > >>> + *
> > >>> + * @svq   The svq
> > >>> + */
> > >>> +static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue 
> > >>> *svq)
> > >>>{
> > >>> -VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> > >>> - sv

Re: [PATCH] vhost: validate range size before adding to iotlb

2022-02-22 Thread Jason Wang
On Tue, Feb 22, 2022 at 11:02 PM Michael S. Tsirkin  wrote:
>
> On Tue, Feb 22, 2022 at 03:11:07PM +0800, Jason Wang wrote:
> > On Tue, Feb 22, 2022 at 12:57 PM Anirudh Rayabharam  
> > wrote:
> > >
> > > On Tue, Feb 22, 2022 at 10:50:20AM +0800, Jason Wang wrote:
> > > > On Tue, Feb 22, 2022 at 3:53 AM Anirudh Rayabharam  
> > > > wrote:
> > > > >
> > > > > In vhost_iotlb_add_range_ctx(), validate the range size is non-zero
> > > > > before proceeding with adding it to the iotlb.
> > > > >
> > > > > Range size can overflow to 0 when start is 0 and last is (2^64 - 1).
> > > > > One instance where it can happen is when userspace sends an IOTLB
> > > > > message with iova=size=uaddr=0 (vhost_process_iotlb_msg). So, an
> > > > > entry with size = 0, start = 0, last = (2^64 - 1) ends up in the
> > > > > iotlb. Next time a packet is sent, iotlb_access_ok() loops
> > > > > indefinitely due to that erroneous entry:
> > > > >
> > > > > Call Trace:
> > > > >  
> > > > >  iotlb_access_ok+0x21b/0x3e0 drivers/vhost/vhost.c:1340
> > > > >  vq_meta_prefetch+0xbc/0x280 drivers/vhost/vhost.c:1366
> > > > >  vhost_transport_do_send_pkt+0xe0/0xfd0 
> > > > > drivers/vhost/vsock.c:104
> > > > >  vhost_worker+0x23d/0x3d0 drivers/vhost/vhost.c:372
> > > > >  kthread+0x2e9/0x3a0 kernel/kthread.c:377
> > > > >  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> > > > >  
> > > > >
> > > > > Reported by syzbot at:
> > > > > https://syzkaller.appspot.com/bug?extid=0abd373e2e50d704db87
> > > > >
> > > > > Reported-by: syzbot+0abd373e2e50d704d...@syzkaller.appspotmail.com
> > > > > Tested-by: syzbot+0abd373e2e50d704d...@syzkaller.appspotmail.com
> > > > > Signed-off-by: Anirudh Rayabharam 
> > > > > ---
> > > > >  drivers/vhost/iotlb.c | 6 --
> > > > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/drivers/vhost/iotlb.c b/drivers/vhost/iotlb.c
> > > > > index 670d56c879e5..b9de74bd2f9c 100644
> > > > > --- a/drivers/vhost/iotlb.c
> > > > > +++ b/drivers/vhost/iotlb.c
> > > > > @@ -53,8 +53,10 @@ int vhost_iotlb_add_range_ctx(struct vhost_iotlb 
> > > > > *iotlb,
> > > > >   void *opaque)
> > > > >  {
> > > > > struct vhost_iotlb_map *map;
> > > > > +   u64 size = last - start + 1;
> > > > >
> > > > > -   if (last < start)
> > > > > +   // size can overflow to 0 when start is 0 and last is (2^64 - 
> > > > > 1).
> > > > > +   if (last < start || size == 0)
> > > > > return -EFAULT;
> > > >
> > > > I'd move this check to vhost_chr_iter_write(), then for the device who
> > > > has its own msg handler (e.g vDPA) can benefit from it as well.
> > >
> > > Thanks for reviewing!
> > >
> > > I kept the check here thinking that all devices would benefit from it
> > > because they would need to call vhost_iotlb_add_range() to add an entry
> > > to the iotlb. Isn't that correct?
> >
> > Correct for now but not for the future, it's not guaranteed that the
> > per device iotlb message handler will use vhost iotlb.
> >
> > But I agree that we probably don't need to care about it too much now.
> >
> > > Do you see any other benefit in moving
> > > it to vhost_chr_iter_write()?
> > >
> > > One concern I have is that if we move it out some future caller to
> > > vhost_iotlb_add_range() might forget to handle this case.
> >
> > Yes.
> >
> > Rethink the whole fix, we're basically rejecting [0, ULONG_MAX] range
> > which seems a little bit odd.
>
> Well, I guess ideally we'd split this up as two entries - this kind of
> thing is after all one of the reasons we initially used first,last as
> the API - as opposed to first,size.
>
> Anirudh, could you do it like this instead of rejecting?
>
>
> > I wonder if it's better to just remove
> > the map->size. Having a quick glance at the the user, I don't see any
> > blocker for this.
> >
> > Thanks
>
> I think it's possible but won't solve the bug by itself, and we'd need
> to review and fix all users - a high chance of introducing
> another regression. And I think there's value of fitting under the
> stable rule of 100 lines with context.
> So sure, but let's fix the bug first.

Ok, I agree.

Thanks

>
>
>
> > >
> > > Thanks!
> > >
> > > - Anirudh.
> > >
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > > if (iotlb->limit &&
> > > > > @@ -69,7 +71,7 @@ int vhost_iotlb_add_range_ctx(struct vhost_iotlb 
> > > > > *iotlb,
> > > > > return -ENOMEM;
> > > > >
> > > > > map->start = start;
> > > > > -   map->size = last - start + 1;
> > > > > +   map->size = size;
> > > > > map->last = last;
> > > > > map->addr = addr;
> > > > > map->perm = perm;
> > > > > --
> > > > > 2.35.1
> > > > >
> > > >
> > >
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman

Re: [PATCH v2] vhost/vsock: don't check owner in vhost_vsock_stop() while releasing

2022-02-22 Thread Jason Wang
On Tue, Feb 22, 2022 at 5:47 PM Stefano Garzarella  wrote:
>
> vhost_vsock_stop() calls vhost_dev_check_owner() to check the device
> ownership. It expects current->mm to be valid.
>
> vhost_vsock_stop() is also called by vhost_vsock_dev_release() when
> the user has not done close(), so when we are in do_exit(). In this
> case current->mm is invalid and we're releasing the device, so we
> should clean it anyway.
>
> Let's check the owner only when vhost_vsock_stop() is called
> by an ioctl.
>
> When invoked from release we can not fail so we don't check return
> code of vhost_vsock_stop(). We need to stop vsock even if it's not
> the owner.
>
> Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko")
> Cc: sta...@vger.kernel.org
> Reported-by: syzbot+1e3ea63db39f2b444...@syzkaller.appspotmail.com
> Reported-and-tested-by: syzbot+3140b17cb44a7b174...@syzkaller.appspotmail.com
> Signed-off-by: Stefano Garzarella 

Acked-by: Jason Wang 

> ---
> v2:
> - initialized `ret` in vhost_vsock_stop [Dan]
> - added comment about vhost_vsock_stop() calling in the code and an 
> explanation
>   in the commit message [MST]
>
> v1: 
> https://lore.kernel.org/virtualization/20220221114916.107045-1-sgarz...@redhat.com
> ---
>  drivers/vhost/vsock.c | 21 ++---
>  1 file changed, 14 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index d6ca1c7ad513..37f0b4274113 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -629,16 +629,18 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
> return ret;
>  }
>
> -static int vhost_vsock_stop(struct vhost_vsock *vsock)
> +static int vhost_vsock_stop(struct vhost_vsock *vsock, bool check_owner)
>  {
> size_t i;
> -   int ret;
> +   int ret = 0;
>
> mutex_lock(&vsock->dev.mutex);
>
> -   ret = vhost_dev_check_owner(&vsock->dev);
> -   if (ret)
> -   goto err;
> +   if (check_owner) {
> +   ret = vhost_dev_check_owner(&vsock->dev);
> +   if (ret)
> +   goto err;
> +   }
>
> for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) {
> struct vhost_virtqueue *vq = &vsock->vqs[i];
> @@ -753,7 +755,12 @@ static int vhost_vsock_dev_release(struct inode *inode, 
> struct file *file)
>  * inefficient.  Room for improvement here. */
> vsock_for_each_connected_socket(vhost_vsock_reset_orphans);
>
> -   vhost_vsock_stop(vsock);
> +   /* Don't check the owner, because we are in the release path, so we
> +* need to stop the vsock device in any case.
> +* vhost_vsock_stop() can not fail in this case, so we don't need to
> +* check the return code.
> +*/
> +   vhost_vsock_stop(vsock, false);
> vhost_vsock_flush(vsock);
> vhost_dev_stop(&vsock->dev);
>
> @@ -868,7 +875,7 @@ static long vhost_vsock_dev_ioctl(struct file *f, 
> unsigned int ioctl,
> if (start)
> return vhost_vsock_start(vsock);
> else
> -   return vhost_vsock_stop(vsock);
> +   return vhost_vsock_stop(vsock, true);
> case VHOST_GET_FEATURES:
> features = VHOST_VSOCK_FEATURES;
> if (copy_to_user(argp, &features, sizeof(features)))
> --
> 2.35.1
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vhost: use bvec_kmap_local in {get,put}u16_iotlb

2022-02-22 Thread Jason Wang
On Tue, Feb 22, 2022 at 11:49 PM Christoph Hellwig  wrote:
>
> Using local kmaps slightly reduces the chances to stray writes, and
> the bvec interface cleans up the code a little bit.
>
> Signed-off-by: Christoph Hellwig 

Acked-by: Jason Wang 

> ---
>  drivers/vhost/vringh.c | 18 --
>  1 file changed, 8 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
> index 14e2043d76852..0f22a83fd09af 100644
> --- a/drivers/vhost/vringh.c
> +++ b/drivers/vhost/vringh.c
> @@ -1173,7 +1173,7 @@ static inline int getu16_iotlb(const struct vringh *vrh,
>u16 *val, const __virtio16 *p)
>  {
> struct bio_vec iov;
> -   void *kaddr, *from;
> +   void *kaddr;
> int ret;
>
> /* Atomic read is needed for getu16 */
> @@ -1182,10 +1182,9 @@ static inline int getu16_iotlb(const struct vringh 
> *vrh,
> if (ret < 0)
> return ret;
>
> -   kaddr = kmap_atomic(iov.bv_page);
> -   from = kaddr + iov.bv_offset;
> -   *val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)from));
> -   kunmap_atomic(kaddr);
> +   kaddr = bvec_kmap_local(&iov);
> +   *val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)kaddr));
> +   kunmap_local(kaddr);
>
> return 0;
>  }
> @@ -1194,7 +1193,7 @@ static inline int putu16_iotlb(const struct vringh *vrh,
>__virtio16 *p, u16 val)
>  {
> struct bio_vec iov;
> -   void *kaddr, *to;
> +   void *kaddr;
> int ret;
>
> /* Atomic write is needed for putu16 */
> @@ -1203,10 +1202,9 @@ static inline int putu16_iotlb(const struct vringh 
> *vrh,
> if (ret < 0)
> return ret;
>
> -   kaddr = kmap_atomic(iov.bv_page);
> -   to = kaddr + iov.bv_offset;
> -   WRITE_ONCE(*(__virtio16 *)to, cpu_to_vringh16(vrh, val));
> -   kunmap_atomic(kaddr);
> +   kaddr = bvec_kmap_local(&iov);
> +   WRITE_ONCE(*(__virtio16 *)kaddr, cpu_to_vringh16(vrh, val));
> +   kunmap_local(kaddr);
>
> return 0;
>  }
> --
> 2.30.2
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding

2022-02-22 Thread Jason Wang
On Tue, Feb 22, 2022 at 4:56 PM Eugenio Perez Martin
 wrote:
>
> On Tue, Feb 22, 2022 at 8:26 AM Jason Wang  wrote:
> >
> >
> > 在 2022/2/21 下午4:15, Eugenio Perez Martin 写道:
> > > On Mon, Feb 21, 2022 at 8:44 AM Jason Wang  wrote:
> > >>
> > >> 在 2022/2/17 下午8:48, Eugenio Perez Martin 写道:
> > >>> On Tue, Feb 8, 2022 at 9:16 AM Jason Wang  wrote:
> >  在 2022/2/1 下午7:25, Eugenio Perez Martin 写道:
> > > On Sun, Jan 30, 2022 at 7:47 AM Jason Wang  
> > > wrote:
> > >> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > >>> @@ -272,6 +590,28 @@ void 
> > >>> vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int 
> > >>> svq_kick_fd)
> > >>>  void vhost_svq_stop(VhostShadowVirtqueue *svq)
> > >>>  {
> > >>>  event_notifier_set_handler(&svq->svq_kick, NULL);
> > >>> +g_autofree VirtQueueElement *next_avail_elem = NULL;
> > >>> +
> > >>> +if (!svq->vq) {
> > >>> +return;
> > >>> +}
> > >>> +
> > >>> +/* Send all pending used descriptors to guest */
> > >>> +vhost_svq_flush(svq, false);
> > >> Do we need to wait for all the pending descriptors to be completed 
> > >> here?
> > >>
> > > No, this function does not wait, it only completes the forwarding of
> > > the *used* descriptors.
> > >
> > > The best example is the net rx queue in my opinion. This call will
> > > check SVQ's vring used_idx and will forward the last used descriptors
> > > if any, but all available descriptors will remain as available for
> > > qemu's VQ code.
> > >
> > > To skip it would miss those last rx descriptors in migration.
> > >
> > > Thanks!
> >  So it's probably to not the best place to ask. It's more about the
> >  inflight descriptors so it should be TX instead of RX.
> > 
> >  I can imagine the migration last phase, we should stop the vhost-vDPA
> >  before calling vhost_svq_stop(). Then we should be fine regardless of
> >  inflight descriptors.
> > 
> > >>> I think I'm still missing something here.
> > >>>
> > >>> To be on the same page. Regarding tx this could cause repeated tx
> > >>> frames (one at source and other at destination), but never a missed
> > >>> buffer not transmitted. The "stop before" could be interpreted as "SVQ
> > >>> is not forwarding available buffers anymore". Would that work?
> > >>
> > >> Right, but this only work if
> > >>
> > >> 1) a flush to make sure TX DMA for inflight descriptors are all completed
> > >>
> > >> 2) just mark all inflight descriptor used
> > >>
> > > It currently trusts on the reverse: Buffers not marked as used (by the
> > > device) will be available in the destination, so expect
> > > retransmissions.
> >
> >
> > I may miss something but I think we do migrate last_avail_idx. So there
> > won't be a re-transmission, since we depend on qemu virtqueue code to
> > deal with vring base?
> >
>
> On stop, vhost_virtqueue_stop calls vhost_vdpa_get_vring_base. In SVQ
> mode, it returns last_used_idx. After that, vhost.c code set VirtQueue
> last_avail_idx == last_used_idx, and it's migrated after that if I'm
> not wrong.

Ok, I miss these details in the review. I suggest mentioning this in
the change log and add a comment in vhost_vdpa_get_vring_base().

>
> vhost kernel migrates last_avail_idx, but it makes rx buffers
> available on-demand, unlike SVQ. So it does not need to unwind buffers
> or anything like that. Because of how SVQ works with the rx queue,
> this is not possible, since the destination will find no available
> buffers for rx. And for tx you already have described the scenario.
>
> In other words, we cannot see SVQ as a vhost device in that regard:
> SVQ looks for total drain (as "make all guest's buffers available for
> the device ASAP") vs the vhost device which can live with a lot of
> available ones and it will use them on demand. Same problem as
> masking. So the difference in behavior is justified in my opinion, and
> it can be improved in the future with the vdpa in-flight descriptors.
>
> If we restore the state that way in a virtio-net device, it will see
> the available ones as expected, not as in-flight.
>
> Another possibility is to transform all of these into in-flight ones,
> but I feel it would create problems. Can we migrate all rx queues as
> in-flight, with 0 bytes written? Is it worth it?

To clarify, for inflight I meant from the device point of view, that
is [last_used_idx, last_avail_idx).

So for RX and SVQ, it should be as simple as stop forwarding buffers
since last_used_idx should be the same as last_avail_idx in this case.
(Though technically the rx buffer might be modified by the NIC).

> I didn't investigate
> that path too much, but I think the virtio-net emulated device does
> not support that at the moment. If I'm not wrong, we should copy
> something like the body of virtio_blk_load_device if we want to go
> that route.
>
> The current approach might be to

Re: [PATCH v1 1/6] virtio: rename vring_unmap_state_packed() to vring_unmap_extra_packed()

2022-02-22 Thread Jason Wang


在 2022/2/10 下午4:51, Xuan Zhuo 写道:

The actual parameter handled by vring_unmap_state_packed() is that
vring_desc_extra, so this function should use "extra" instead of "state".

Signed-off-by: Xuan Zhuo 



Acked-by: Jason Wang 



---
  drivers/virtio/virtio_ring.c | 17 -
  1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 962f1477b1fa..7cf3ae057833 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -984,24 +984,24 @@ static struct virtqueue *vring_create_virtqueue_split(
   * Packed ring specific functions - *_packed().
   */
  
-static void vring_unmap_state_packed(const struct vring_virtqueue *vq,

-struct vring_desc_extra *state)
+static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
+struct vring_desc_extra *extra)
  {
u16 flags;
  
  	if (!vq->use_dma_api)

return;
  
-	flags = state->flags;

+   flags = extra->flags;
  
  	if (flags & VRING_DESC_F_INDIRECT) {

dma_unmap_single(vring_dma_dev(vq),
-state->addr, state->len,
+extra->addr, extra->len,
 (flags & VRING_DESC_F_WRITE) ?
 DMA_FROM_DEVICE : DMA_TO_DEVICE);
} else {
dma_unmap_page(vring_dma_dev(vq),
-  state->addr, state->len,
+  extra->addr, extra->len,
   (flags & VRING_DESC_F_WRITE) ?
   DMA_FROM_DEVICE : DMA_TO_DEVICE);
}
@@ -1303,8 +1303,7 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
for (n = 0; n < total_sg; n++) {
if (i == err_idx)
break;
-   vring_unmap_state_packed(vq,
-&vq->packed.desc_extra[curr]);
+   vring_unmap_extra_packed(vq, &vq->packed.desc_extra[curr]);
curr = vq->packed.desc_extra[curr].next;
i++;
if (i >= vq->packed.vring.num)
@@ -1383,8 +1382,8 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
if (unlikely(vq->use_dma_api)) {
curr = id;
for (i = 0; i < state->num; i++) {
-   vring_unmap_state_packed(vq,
-   &vq->packed.desc_extra[curr]);
+   vring_unmap_extra_packed(vq,
+&vq->packed.desc_extra[curr]);
curr = vq->packed.desc_extra[curr].next;
}
}


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v1 2/6] virtio: remove flags check for unmap split indirect desc

2022-02-22 Thread Jason Wang


在 2022/2/10 下午4:51, Xuan Zhuo 写道:

When calling vring_unmap_one_split_indirect(), it will not encounter the
situation that the flags contains VRING_DESC_F_INDIRECT. So remove this
logic.

Signed-off-by: Xuan Zhuo 



Acked-by: Jason Wang 



---
  drivers/virtio/virtio_ring.c | 18 +-
  1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 7cf3ae057833..fadd0a7503e9 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -379,19 +379,11 @@ static void vring_unmap_one_split_indirect(const struct 
vring_virtqueue *vq,
  
  	flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
  
-	if (flags & VRING_DESC_F_INDIRECT) {

-   dma_unmap_single(vring_dma_dev(vq),
-virtio64_to_cpu(vq->vq.vdev, desc->addr),
-virtio32_to_cpu(vq->vq.vdev, desc->len),
-(flags & VRING_DESC_F_WRITE) ?
-DMA_FROM_DEVICE : DMA_TO_DEVICE);
-   } else {
-   dma_unmap_page(vring_dma_dev(vq),
-  virtio64_to_cpu(vq->vq.vdev, desc->addr),
-  virtio32_to_cpu(vq->vq.vdev, desc->len),
-  (flags & VRING_DESC_F_WRITE) ?
-  DMA_FROM_DEVICE : DMA_TO_DEVICE);
-   }
+   dma_unmap_page(vring_dma_dev(vq),
+  virtio64_to_cpu(vq->vq.vdev, desc->addr),
+  virtio32_to_cpu(vq->vq.vdev, desc->len),
+  (flags & VRING_DESC_F_WRITE) ?
+  DMA_FROM_DEVICE : DMA_TO_DEVICE);
  }
  
  static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v1 3/6] virtio: remove flags check for unmap packed indirect desc

2022-02-22 Thread Jason Wang


在 2022/2/10 下午4:51, Xuan Zhuo 写道:

When calling vring_unmap_desc_packed(), it will not encounter the
situation that the flags contains VRING_DESC_F_INDIRECT. So remove this
logic.



This seems not correct.

There's a call from detach_buf_packed() that can work for indirect 
descriptors?


Thanks




Signed-off-by: Xuan Zhuo 
---
  drivers/virtio/virtio_ring.c | 18 +-
  1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index fadd0a7503e9..cfb028ca238e 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1009,19 +1009,11 @@ static void vring_unmap_desc_packed(const struct 
vring_virtqueue *vq,
  
  	flags = le16_to_cpu(desc->flags);
  
-	if (flags & VRING_DESC_F_INDIRECT) {

-   dma_unmap_single(vring_dma_dev(vq),
-le64_to_cpu(desc->addr),
-le32_to_cpu(desc->len),
-(flags & VRING_DESC_F_WRITE) ?
-DMA_FROM_DEVICE : DMA_TO_DEVICE);
-   } else {
-   dma_unmap_page(vring_dma_dev(vq),
-  le64_to_cpu(desc->addr),
-  le32_to_cpu(desc->len),
-  (flags & VRING_DESC_F_WRITE) ?
-  DMA_FROM_DEVICE : DMA_TO_DEVICE);
-   }
+   dma_unmap_page(vring_dma_dev(vq),
+  le64_to_cpu(desc->addr),
+  le32_to_cpu(desc->len),
+  (flags & VRING_DESC_F_WRITE) ?
+  DMA_FROM_DEVICE : DMA_TO_DEVICE);
  }
  
  static struct vring_packed_desc *alloc_indirect_packed(unsigned int total_sg,


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v1 0/6] virtio: support advance DMA

2022-02-22 Thread Jason Wang
On Tue, Feb 22, 2022 at 4:00 PM Xuan Zhuo  wrote:
>
> On Tue, 22 Feb 2022 12:02:14 +0800, Jason Wang  wrote:
> >
> > 在 2022/2/21 下午7:23, Xuan Zhuo 写道:
> > > On Mon, 21 Feb 2022 11:32:52 +0800, Jason Wang  
> > > wrote:
> > >> On Fri, Feb 18, 2022 at 5:00 PM Xuan Zhuo  
> > >> wrote:
> > >>> On Thu, 17 Feb 2022 15:19:44 +0800, Jason Wang  
> > >>> wrote:
> >  On Thu, Feb 10, 2022 at 4:51 PM Xuan Zhuo  
> >  wrote:
> > > virtqueue_add() only supports virtual addresses, dma is completed in
> > > virtqueue_add().
> > >
> > > In some scenarios (such as the AF_XDP scenario), DMA is completed in 
> > > advance, so
> > > it is necessary for us to support passing the DMA address to 
> > > virtqueue_add().
> >  I'd suggest rename this feature as "unmanaged DMA".
> > >>> OK
> > >>>
> > > Record this predma information in extra->flags, which can be skipped 
> > > when
> > > executing dma unmap.
> >  Question still, can we use per-virtqueue flag instead of per
> >  descriptor flag? If my memory is correct, the answer is yes in the
> >  discussion for the previous version.
> > 
> > >>> Yes.
> > >>>
> > >>> per-virtqueue? I guess it should be per-submit.
> > >>>
> > >>> This patch set only adds a flag to desc_extra[head].flags, so that we 
> > >>> can know
> > >>> if we need to unmap dma when we detach.
> > >> I meant if we can manage to make it per virtqueue, there's no need to
> > >> maintain per buffer flag.
> > >>
> > > Rethinking this question, I feel there is no essential difference between 
> > > per
> > > virtqueue and per sgs.
> > >
> > > per virtqueue:
> > > 1. add buf:
> > > a. check vq->premapped for map every sg
> > > 2. detach:
> > > a. check vq->premaped for unmap
> > >
> > > per sgs:
> > > 1. add buf:
> > > a. check function parameter "premapped" for map every sg
> > > b. add flag to extra[head].flag
> > >
> > > 2. detach:
> > > a: check extra[head].flag for unmap
> > >
> > >
> > > Thanks.
> >
> >
> > Per-virtqueue is still a little bit easier at the first glance.
> >
> > Actually, per-sg have one advantage: it can be used without virtqueue
> > reset (to allow switching between the two modes). But I'm not sure
> > whether we had such requirements.
> >
> > I think to answer this question, we probably need a real use case (if we
> > can come up with a case that is more lightweight than AF_XDP, that would
> > be even better).
>
> Sadly, I didn't think of other scenarios. Hope someone can give a scenario.
>
> For per virtqueue, virtio-net will also switch to premapped. Because the tx
> queue is shared.
>
> But in the process of implementing this, I encountered a troublesome problem. 
> We
> need to record the dma address in virtnet. For tx, since skb contains multiple
> frags, there will be many dma addresses. When unmap in virtnet It will be more
> troublesome. Because we have to regain these dma addresses.

Right, actually, we store the dma address in desc_extra, but exposing
it to the driver seems like an overkill.

>
> I think of two ways:
>
> 1. Let virtio return the addr of each desc when detached.
> 2. Allocate a block of memory for each sq/rq to hold the dma address.
>
> Thanks.

So it looks to me having a per buffer flag seems ok. Let me go through
this series.

Thanks

>
> >
> > Thanks
> >
> >
> > >
> > >
> > >> So we know something that needs to be mapped by virtio core itself,
> > >> e.g the indirect page. Other than this, all the rest could be
> > >> pre-mapped.
> > >>
> > >> For vnet header, it could be mapped by virtio-net which could be still
> > >> treated as pre mapped DMA since it's not the virtio ring code.
> > >>
> > >> Anything I miss here?
> > >>
> > >> Thanks
> > >>
> > >>
> > >>> Thanks.
> > >>>
> >  Thanks
> > 
> > > v1:
> > > 1. All sgs requested at one time are required to be unified 
> > > PREDMA, and several
> > >of them are not supported to be PREDMA
> > > 2. virtio_dma_map() is removed from this patch set and will be 
> > > submitted
> > >together with the next time AF_XDP supports virtio dma
> > > 3. Added patch #2 #3 to remove the check for flags when 
> > > performing unmap
> > >indirect desc
> > >
> > > Xuan Zhuo (6):
> > >virtio: rename vring_unmap_state_packed() to
> > >  vring_unmap_extra_packed()
> > >virtio: remove flags check for unmap split indirect desc
> > >virtio: remove flags check for unmap packed indirect desc
> > >virtio: virtqueue_add() support predma
> > >virtio: split: virtqueue_add_split() support dma address
> > >virtio: packed: virtqueue_add_packed() support dma address
> > >
> > >   drivers/virtio/virtio_ring.c | 199 
> > > ++-
> > >   1 file changed, 126 insertions(+), 73 deletions(-)
> > >
> > > --
> > > 2

Re: [PATCH v1 4/6] virtio: virtqueue_add() support predma

2022-02-22 Thread Jason Wang


在 2022/2/10 下午4:51, Xuan Zhuo 写道:

virtuque_add() adds parameter predma.

Signed-off-by: Xuan Zhuo 
---
  drivers/virtio/virtio_ring.c | 11 ++-
  1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index cfb028ca238e..cf9d118668f1 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1780,7 +1780,8 @@ static inline int virtqueue_add(struct virtqueue *_vq,
unsigned int in_sgs,
void *data,
void *ctx,
-   gfp_t gfp)
+   gfp_t gfp,
+   bool predma)



sg is assumed to use dma address, so I wonder whether "sg_is_phys" is a 
better name?


Thanks



  {
struct vring_virtqueue *vq = to_vvq(_vq);
  
@@ -1821,7 +1822,7 @@ int virtqueue_add_sgs(struct virtqueue *_vq,

total_sg++;
}
return virtqueue_add(_vq, sgs, total_sg, out_sgs, in_sgs,
-data, NULL, gfp);
+data, NULL, gfp, false);
  }
  EXPORT_SYMBOL_GPL(virtqueue_add_sgs);
  
@@ -1843,7 +1844,7 @@ int virtqueue_add_outbuf(struct virtqueue *vq,

 void *data,
 gfp_t gfp)
  {
-   return virtqueue_add(vq, &sg, num, 1, 0, data, NULL, gfp);
+   return virtqueue_add(vq, &sg, num, 1, 0, data, NULL, gfp, false);
  }
  EXPORT_SYMBOL_GPL(virtqueue_add_outbuf);
  
@@ -1865,7 +1866,7 @@ int virtqueue_add_inbuf(struct virtqueue *vq,

void *data,
gfp_t gfp)
  {
-   return virtqueue_add(vq, &sg, num, 0, 1, data, NULL, gfp);
+   return virtqueue_add(vq, &sg, num, 0, 1, data, NULL, gfp, false);
  }
  EXPORT_SYMBOL_GPL(virtqueue_add_inbuf);
  
@@ -1889,7 +1890,7 @@ int virtqueue_add_inbuf_ctx(struct virtqueue *vq,

void *ctx,
gfp_t gfp)
  {
-   return virtqueue_add(vq, &sg, num, 0, 1, data, ctx, gfp);
+   return virtqueue_add(vq, &sg, num, 0, 1, data, ctx, gfp, false);
  }
  EXPORT_SYMBOL_GPL(virtqueue_add_inbuf_ctx);
  


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v1 4/6] virtio: virtqueue_add() support predma

2022-02-22 Thread Jason Wang
On Wed, Feb 23, 2022 at 11:01 AM Jason Wang  wrote:
>
>
> 在 2022/2/10 下午4:51, Xuan Zhuo 写道:
> > virtuque_add() adds parameter predma.
> >
> > Signed-off-by: Xuan Zhuo 
> > ---
> >   drivers/virtio/virtio_ring.c | 11 ++-
> >   1 file changed, 6 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index cfb028ca238e..cf9d118668f1 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -1780,7 +1780,8 @@ static inline int virtqueue_add(struct virtqueue *_vq,
> >   unsigned int in_sgs,
> >   void *data,
> >   void *ctx,
> > - gfp_t gfp)
> > + gfp_t gfp,
> > + bool predma)
>
>
> sg is assumed to use dma address, so I wonder whether "sg_is_phys" is a
> better name?

Speak too fast, I was wrong here, I think we should be consistent
here, so "premapped" should be better.

Thanks

>
> Thanks
>
>
> >   {
> >   struct vring_virtqueue *vq = to_vvq(_vq);
> >
> > @@ -1821,7 +1822,7 @@ int virtqueue_add_sgs(struct virtqueue *_vq,
> >   total_sg++;
> >   }
> >   return virtqueue_add(_vq, sgs, total_sg, out_sgs, in_sgs,
> > -  data, NULL, gfp);
> > +  data, NULL, gfp, false);
> >   }
> >   EXPORT_SYMBOL_GPL(virtqueue_add_sgs);
> >
> > @@ -1843,7 +1844,7 @@ int virtqueue_add_outbuf(struct virtqueue *vq,
> >void *data,
> >gfp_t gfp)
> >   {
> > - return virtqueue_add(vq, &sg, num, 1, 0, data, NULL, gfp);
> > + return virtqueue_add(vq, &sg, num, 1, 0, data, NULL, gfp, false);
> >   }
> >   EXPORT_SYMBOL_GPL(virtqueue_add_outbuf);
> >
> > @@ -1865,7 +1866,7 @@ int virtqueue_add_inbuf(struct virtqueue *vq,
> >   void *data,
> >   gfp_t gfp)
> >   {
> > - return virtqueue_add(vq, &sg, num, 0, 1, data, NULL, gfp);
> > + return virtqueue_add(vq, &sg, num, 0, 1, data, NULL, gfp, false);
> >   }
> >   EXPORT_SYMBOL_GPL(virtqueue_add_inbuf);
> >
> > @@ -1889,7 +1890,7 @@ int virtqueue_add_inbuf_ctx(struct virtqueue *vq,
> >   void *ctx,
> >   gfp_t gfp)
> >   {
> > - return virtqueue_add(vq, &sg, num, 0, 1, data, ctx, gfp);
> > + return virtqueue_add(vq, &sg, num, 0, 1, data, ctx, gfp, false);
> >   }
> >   EXPORT_SYMBOL_GPL(virtqueue_add_inbuf_ctx);
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] virtio_ring: aovid reading flag from the descriptor ring

2022-02-22 Thread Jason Wang
On Mon, Nov 8, 2021 at 4:13 PM Jason Wang  wrote:
>
> Commit 72b5e8958738 ("virtio-ring: store DMA metadata in desc_extra
> for split virtqueue") tries to make it possible for the driver to not
> read from the descriptor ring to prevent the device from corrupting
> the descriptor ring. But it still read the descriptor flag from the
> descriptor ring during buffer detach.
>
> This patch fixes by always store the descriptor flag no matter whether
> DMA API is used and then we can avoid reading descriptor flag from the
> descriptor ring. This eliminates the possibly of unexpected next
> descriptor caused by the wrong flag (e.g the next flag).
>
> Signed-off-by: Jason Wang 

Michael, any comment for this?

Thanks

> ---
>  drivers/virtio/virtio_ring.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 00f64f2f8b72..28734f4e57d3 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -583,7 +583,7 @@ static inline int virtqueue_add_split(struct virtqueue 
> *_vq,
> }
> /* Last one doesn't continue. */
> desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
> -   if (!indirect && vq->use_dma_api)
> +   if (!indirect)
> vq->split.desc_extra[prev & (vq->split.vring.num - 1)].flags 
> &=
> ~VRING_DESC_F_NEXT;
>
> @@ -713,7 +713,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, 
> unsigned int head,
> /* Put back on free list: unmap first-level descriptors and find end 
> */
> i = head;
>
> -   while (vq->split.vring.desc[i].flags & nextflag) {
> +   while (vq->split.desc_extra[i].flags & nextflag) {
> vring_unmap_one_split(vq, i);
> i = vq->split.desc_extra[i].next;
> vq->vq.num_free++;
> --
> 2.25.1
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1 5/6] virtio: split: virtqueue_add_split() support dma address

2022-02-22 Thread Jason Wang


在 2022/2/10 下午4:51, Xuan Zhuo 写道:

virtqueue_add_split() only supports virtual addresses, dma is completed
in virtqueue_add_split().

In some scenarios (such as the AF_XDP scenario), the memory is allocated
and DMA is completed in advance, so it is necessary for us to support
passing the DMA address to virtqueue_add_split().

And record this predma information in extra->flags, which can be skipped
when executing dma unmap.

Signed-off-by: Xuan Zhuo 
---
  drivers/virtio/virtio_ring.c | 62 
  1 file changed, 48 insertions(+), 14 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index cf9d118668f1..d32c0bf6016f 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -66,6 +66,9 @@
  #define LAST_ADD_TIME_INVALID(vq)
  #endif
  
+/* This means the buffer dma is pre-alloc. Just used by vring_desc_extra */

+#define VIRTIO_DESC_F_PREDMA (1 << 15)



I suggest to use a new field in desc_extra to avoid conflict with future 
virtio extension.




+
  struct vring_desc_state_split {
void *data; /* Data for callback. */
struct vring_desc *indir_desc;  /* Indirect descriptor, if any. */
@@ -387,7 +390,7 @@ static void vring_unmap_one_split_indirect(const struct 
vring_virtqueue *vq,
  }
  
  static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,

- unsigned int i)
+ unsigned int i, bool predma)
  {
struct vring_desc_extra *extra = vq->split.desc_extra;
u16 flags;
@@ -404,6 +407,9 @@ static unsigned int vring_unmap_one_split(const struct 
vring_virtqueue *vq,
 (flags & VRING_DESC_F_WRITE) ?
 DMA_FROM_DEVICE : DMA_TO_DEVICE);
} else {
+   if (predma)
+   goto out;
+
dma_unmap_page(vring_dma_dev(vq),
   extra[i].addr,
   extra[i].len,
@@ -474,7 +480,8 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
  unsigned int in_sgs,
  void *data,
  void *ctx,
- gfp_t gfp)
+ gfp_t gfp,
+ bool predma)
  {
struct vring_virtqueue *vq = to_vvq(_vq);
struct scatterlist *sg;
@@ -535,9 +542,16 @@ static inline int virtqueue_add_split(struct virtqueue 
*_vq,
  
  	for (n = 0; n < out_sgs; n++) {

for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-   dma_addr_t addr = vring_map_one_sg(vq, sg, 
DMA_TO_DEVICE);
-   if (vring_mapping_error(vq, addr))
-   goto unmap_release;
+   dma_addr_t addr;
+
+   if (predma) {
+   addr = sg_dma_address(sg);
+
+   } else {
+   addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
+   if (vring_mapping_error(vq, addr))
+   goto unmap_release;
+   }
  
  			prev = i;

/* Note that we trust indirect descriptor
@@ -550,9 +564,16 @@ static inline int virtqueue_add_split(struct virtqueue 
*_vq,
}
for (; n < (out_sgs + in_sgs); n++) {
for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-   dma_addr_t addr = vring_map_one_sg(vq, sg, 
DMA_FROM_DEVICE);
-   if (vring_mapping_error(vq, addr))
-   goto unmap_release;
+   dma_addr_t addr;
+
+   if (predma) {
+   addr = sg_dma_address(sg);
+
+   } else {
+   addr = vring_map_one_sg(vq, sg, 
DMA_FROM_DEVICE);
+   if (vring_mapping_error(vq, addr))
+   goto unmap_release;
+   }
  
  			prev = i;

/* Note that we trust indirect descriptor
@@ -602,6 +623,9 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
else
vq->split.desc_state[head].indir_desc = ctx;
  
+	if (predma)

+   vq->split.desc_extra[head].flags |= VIRTIO_DESC_F_PREDMA;
+
/* Put entry in available array (but don't update avail->idx until they
 * do sync). */
avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1);
@@ -626,6 +650,9 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
return 0;
  
  unmap_release:

+   if (predma)
+   goto skip_unmap;
+



Nit: we probably need a better name for the label how about "unmap_free"?

Thanks



  

Re: [PATCH v1 6/6] virtio: packed: virtqueue_add_packed() support dma address

2022-02-22 Thread Jason Wang


在 2022/2/10 下午4:51, Xuan Zhuo 写道:

virtqueue_add_packed() only supports virtual addresses, dma is completed
in virtqueue_add_packed().

In some scenarios (such as the AF_XDP scenario), the memory is allocated
and DMA is completed in advance, so it is necessary for us to support
passing the DMA address to virtqueue_add_packed().

Record this predma information in extra->flags, which can be skipped
when executing dma unmap.

Signed-off-by: Xuan Zhuo 



Except for the similar comments as split version other looks good.

Thanks



---
  drivers/virtio/virtio_ring.c | 79 ++--
  1 file changed, 57 insertions(+), 22 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index d32c0bf6016f..b8c7697e925d 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1011,7 +1011,8 @@ static struct virtqueue *vring_create_virtqueue_split(
   */
  
  static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,

-struct vring_desc_extra *extra)
+struct vring_desc_extra *extra,
+bool predma)
  {
u16 flags;
  
@@ -1026,6 +1027,9 @@ static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,

 (flags & VRING_DESC_F_WRITE) ?
 DMA_FROM_DEVICE : DMA_TO_DEVICE);
} else {
+   if (predma)
+   return;
+
dma_unmap_page(vring_dma_dev(vq),
   extra->addr, extra->len,
   (flags & VRING_DESC_F_WRITE) ?
@@ -1073,7 +1077,8 @@ static int virtqueue_add_indirect_packed(struct 
vring_virtqueue *vq,
 unsigned int out_sgs,
 unsigned int in_sgs,
 void *data,
-gfp_t gfp)
+gfp_t gfp,
+bool predma)
  {
struct vring_packed_desc *desc;
struct scatterlist *sg;
@@ -1099,10 +1104,15 @@ static int virtqueue_add_indirect_packed(struct 
vring_virtqueue *vq,
  
  	for (n = 0; n < out_sgs + in_sgs; n++) {

for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-   addr = vring_map_one_sg(vq, sg, n < out_sgs ?
-   DMA_TO_DEVICE : DMA_FROM_DEVICE);
-   if (vring_mapping_error(vq, addr))
-   goto unmap_release;
+   if (predma) {
+   addr = sg_dma_address(sg);
+
+   } else {
+   addr = vring_map_one_sg(vq, sg, n < out_sgs ?
+   DMA_TO_DEVICE : 
DMA_FROM_DEVICE);
+   if (vring_mapping_error(vq, addr))
+   goto unmap_release;
+   }
  
  			desc[i].flags = cpu_to_le16(n < out_sgs ?

0 : VRING_DESC_F_WRITE);
@@ -1132,6 +1142,9 @@ static int virtqueue_add_indirect_packed(struct 
vring_virtqueue *vq,
  vq->packed.avail_used_flags;
}
  
+	if (predma)

+   vq->packed.desc_extra[id].flags |= VIRTIO_DESC_F_PREDMA;
+
/*
 * A driver MUST NOT make the first descriptor in the list
 * available before all subsequent descriptors comprising
@@ -1170,10 +1183,11 @@ static int virtqueue_add_indirect_packed(struct 
vring_virtqueue *vq,
return 0;
  
  unmap_release:

-   err_idx = i;
-
-   for (i = 0; i < err_idx; i++)
-   vring_unmap_desc_packed(vq, &desc[i]);
+   if (!predma) {
+   err_idx = i;
+   for (i = 0; i < err_idx; i++)
+   vring_unmap_desc_packed(vq, &desc[i]);
+   }
  
  	kfree(desc);
  
@@ -1188,7 +1202,8 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,

   unsigned int in_sgs,
   void *data,
   void *ctx,
-  gfp_t gfp)
+  gfp_t gfp,
+  bool predma)
  {
struct vring_virtqueue *vq = to_vvq(_vq);
struct vring_packed_desc *desc;
@@ -1214,7 +1229,7 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
  
  	if (virtqueue_use_indirect(_vq, total_sg)) {

err = virtqueue_add_indirect_packed(vq, sgs, total_sg, out_sgs,
-   in_sgs, data, gfp);
+   in_sgs, data, gfp, predma);
if (err != -ENOMEM) {

Re: [PATCH v1 0/6] virtio: support advance DMA

2022-02-22 Thread Jason Wang
On Wed, Feb 23, 2022 at 10:58 AM Jason Wang  wrote:
>
> On Tue, Feb 22, 2022 at 4:00 PM Xuan Zhuo  wrote:
> >
> > On Tue, 22 Feb 2022 12:02:14 +0800, Jason Wang  wrote:
> > >
> > > 在 2022/2/21 下午7:23, Xuan Zhuo 写道:
> > > > On Mon, 21 Feb 2022 11:32:52 +0800, Jason Wang  
> > > > wrote:
> > > >> On Fri, Feb 18, 2022 at 5:00 PM Xuan Zhuo  
> > > >> wrote:
> > > >>> On Thu, 17 Feb 2022 15:19:44 +0800, Jason Wang  
> > > >>> wrote:
> > >  On Thu, Feb 10, 2022 at 4:51 PM Xuan Zhuo 
> > >   wrote:
> > > > virtqueue_add() only supports virtual addresses, dma is completed in
> > > > virtqueue_add().
> > > >
> > > > In some scenarios (such as the AF_XDP scenario), DMA is completed 
> > > > in advance, so
> > > > it is necessary for us to support passing the DMA address to 
> > > > virtqueue_add().
> > >  I'd suggest rename this feature as "unmanaged DMA".
> > > >>> OK
> > > >>>
> > > > Record this predma information in extra->flags, which can be 
> > > > skipped when
> > > > executing dma unmap.
> > >  Question still, can we use per-virtqueue flag instead of per
> > >  descriptor flag? If my memory is correct, the answer is yes in the
> > >  discussion for the previous version.
> > > 
> > > >>> Yes.
> > > >>>
> > > >>> per-virtqueue? I guess it should be per-submit.
> > > >>>
> > > >>> This patch set only adds a flag to desc_extra[head].flags, so that we 
> > > >>> can know
> > > >>> if we need to unmap dma when we detach.
> > > >> I meant if we can manage to make it per virtqueue, there's no need to
> > > >> maintain per buffer flag.
> > > >>
> > > > Rethinking this question, I feel there is no essential difference 
> > > > between per
> > > > virtqueue and per sgs.
> > > >
> > > > per virtqueue:
> > > > 1. add buf:
> > > > a. check vq->premapped for map every sg
> > > > 2. detach:
> > > > a. check vq->premaped for unmap
> > > >
> > > > per sgs:
> > > > 1. add buf:
> > > > a. check function parameter "premapped" for map every sg
> > > > b. add flag to extra[head].flag
> > > >
> > > > 2. detach:
> > > > a: check extra[head].flag for unmap
> > > >
> > > >
> > > > Thanks.
> > >
> > >
> > > Per-virtqueue is still a little bit easier at the first glance.
> > >
> > > Actually, per-sg have one advantage: it can be used without virtqueue
> > > reset (to allow switching between the two modes). But I'm not sure
> > > whether we had such requirements.
> > >
> > > I think to answer this question, we probably need a real use case (if we
> > > can come up with a case that is more lightweight than AF_XDP, that would
> > > be even better).
> >
> > Sadly, I didn't think of other scenarios. Hope someone can give a scenario.
> >
> > For per virtqueue, virtio-net will also switch to premapped. Because the tx
> > queue is shared.
> >
> > But in the process of implementing this, I encountered a troublesome 
> > problem. We
> > need to record the dma address in virtnet. For tx, since skb contains 
> > multiple
> > frags, there will be many dma addresses. When unmap in virtnet It will be 
> > more
> > troublesome. Because we have to regain these dma addresses.
>
> Right, actually, we store the dma address in desc_extra, but exposing
> it to the driver seems like an overkill.
>
> >
> > I think of two ways:
> >
> > 1. Let virtio return the addr of each desc when detached.
> > 2. Allocate a block of memory for each sq/rq to hold the dma address.
> >
> > Thanks.
>
> So it looks to me having a per buffer flag seems ok. Let me go through
> this series.
>

Ok, so the series looks good overall, but we need a user.  I wonder if
we can convert XDP to use that as an example. (and AF_XDP on top).

Thanks

> Thanks
>
> >
> > >
> > > Thanks
> > >
> > >
> > > >
> > > >
> > > >> So we know something that needs to be mapped by virtio core itself,
> > > >> e.g the indirect page. Other than this, all the rest could be
> > > >> pre-mapped.
> > > >>
> > > >> For vnet header, it could be mapped by virtio-net which could be still
> > > >> treated as pre mapped DMA since it's not the virtio ring code.
> > > >>
> > > >> Anything I miss here?
> > > >>
> > > >> Thanks
> > > >>
> > > >>
> > > >>> Thanks.
> > > >>>
> > >  Thanks
> > > 
> > > > v1:
> > > > 1. All sgs requested at one time are required to be unified 
> > > > PREDMA, and several
> > > >of them are not supported to be PREDMA
> > > > 2. virtio_dma_map() is removed from this patch set and will be 
> > > > submitted
> > > >together with the next time AF_XDP supports virtio dma
> > > > 3. Added patch #2 #3 to remove the check for flags when 
> > > > performing unmap
> > > >indirect desc
> > > >
> > > > Xuan Zhuo (6):
> > > >virtio: rename vring_unmap_state_packed() to
> > > >  vring_unmap_extra_packed()
> > > >virtio: remove flags check for unmap split

Re: [PATCH 28/31] vdpa: Expose VHOST_F_LOG_ALL on SVQ

2022-02-22 Thread Jason Wang
On Tue, Feb 22, 2022 at 4:06 PM Eugenio Perez Martin
 wrote:
>
> On Tue, Feb 22, 2022 at 8:41 AM Jason Wang  wrote:
> >
> >
> > 在 2022/2/17 下午4:22, Eugenio Perez Martin 写道:
> > > On Thu, Feb 17, 2022 at 7:02 AM Jason Wang  wrote:
> > >> On Wed, Feb 16, 2022 at 11:54 PM Eugenio Perez Martin
> > >>  wrote:
> > >>> On Tue, Feb 8, 2022 at 9:25 AM Jason Wang  wrote:
> > 
> >  在 2022/2/1 下午7:45, Eugenio Perez Martin 写道:
> > > On Sun, Jan 30, 2022 at 7:50 AM Jason Wang  
> > > wrote:
> > >> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > >>> SVQ is able to log the dirty bits by itself, so let's use it to not
> > >>> block migration.
> > >>>
> > >>> Also, ignore set and clear of VHOST_F_LOG_ALL on set_features if 
> > >>> SVQ is
> > >>> enabled. Even if the device supports it, the reports would be 
> > >>> nonsense
> > >>> because SVQ memory is in the qemu region.
> > >>>
> > >>> The log region is still allocated. Future changes might skip that, 
> > >>> but
> > >>> this series is already long enough.
> > >>>
> > >>> Signed-off-by: Eugenio Pérez 
> > >>> ---
> > >>> hw/virtio/vhost-vdpa.c | 20 
> > >>> 1 file changed, 20 insertions(+)
> > >>>
> > >>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > >>> index fb0a338baa..75090d65e8 100644
> > >>> --- a/hw/virtio/vhost-vdpa.c
> > >>> +++ b/hw/virtio/vhost-vdpa.c
> > >>> @@ -1022,6 +1022,9 @@ static int vhost_vdpa_get_features(struct 
> > >>> vhost_dev *dev, uint64_t *features)
> > >>> if (ret == 0 && v->shadow_vqs_enabled) {
> > >>> /* Filter only features that SVQ can offer to guest */
> > >>> vhost_svq_valid_guest_features(features);
> > >>> +
> > >>> +/* Add SVQ logging capabilities */
> > >>> +*features |= BIT_ULL(VHOST_F_LOG_ALL);
> > >>> }
> > >>>
> > >>> return ret;
> > >>> @@ -1039,8 +1042,25 @@ static int vhost_vdpa_set_features(struct 
> > >>> vhost_dev *dev,
> > >>>
> > >>> if (v->shadow_vqs_enabled) {
> > >>> uint64_t dev_features, svq_features, acked_features;
> > >>> +uint8_t status = 0;
> > >>> bool ok;
> > >>>
> > >>> +ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_STATUS, &status);
> > >>> +if (unlikely(ret)) {
> > >>> +return ret;
> > >>> +}
> > >>> +
> > >>> +if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> > >>> +/*
> > >>> + * vhost is trying to enable or disable _F_LOG, and 
> > >>> the device
> > >>> + * would report wrong dirty pages. SVQ handles it.
> > >>> + */
> > >> I fail to understand this comment, I'd think there's no way to 
> > >> disable
> > >> dirty page tracking for SVQ.
> > >>
> > > vhost_log_global_{start,stop} are called at the beginning and end of
> > > migration. To inform the device that it should start logging, they set
> > > or clean VHOST_F_LOG_ALL at vhost_dev_set_log.
> > 
> >  Yes, but for SVQ, we can't disable dirty page tracking, isn't it? The
> >  only thing is to ignore or filter out the F_LOG_ALL and pretend to be
> >  enabled and disabled.
> > 
> > >>> Yes, that's what this patch does.
> > >>>
> > > While SVQ does not use VHOST_F_LOG_ALL, it exports the feature bit so
> > > vhost does not block migration. Maybe we need to look for another way
> > > to do this?
> > 
> >  I'm fine with filtering since it's much more simpler, but I fail to
> >  understand why we need to check DRIVER_OK.
> > 
> > >>> Ok maybe I can make that part more clear,
> > >>>
> > >>> Since both operations use vhost_vdpa_set_features we must just filter
> > >>> the one that actually sets or removes VHOST_F_LOG_ALL, without
> > >>> affecting other features.
> > >>>
> > >>> In practice, that means to not forward the set features after
> > >>> DRIVER_OK. The device is not expecting them anymore.
> > >> I wonder what happens if we don't do this.
> > >>
> > > If we simply delete the check vhost_dev_set_features will return an
> > > error, failing the start of the migration. More on this below.
> >
> >
> > Ok.
> >
> >
> > >
> > >> So kernel had this check:
> > >>
> > >>  /*
> > >>   * It's not allowed to change the features after they have
> > >>   * been negotiated.
> > >>   */
> > >> if (ops->get_status(vdpa) & VIRTIO_CONFIG_S_FEATURES_OK)
> > >>  return -EBUSY;
> > >>
> > >> So is it FEATURES_OK actually?
> > >>
> > > Yes, FEATURES_OK seems more appropriate actually so I will switch to
> > > it for the next version.
> > >
> > > But it should be functionally equivalent, since
> > > vhost.c:vhost_dev_start sets both and the setting of _F_LOG_ALL cannot
> > > be concurrent with it.
> >
> >
> >

Re: [RFC PATCH 1/5] uapi/linux/if_tun.h: Added new ioctl for tun/tap.

2022-02-22 Thread Jason Wang
On Tue, Feb 22, 2022 at 9:28 PM Andrew Melnichenko  wrote:
>
> Hi all,
>
> On Wed, Feb 9, 2022 at 6:26 AM Jason Wang  wrote:
> >
> >
> > 在 2022/1/25 下午4:46, Andrew Melnychenko 写道:
> > > Added TUNGETSUPPORTEDOFFLOADS that should allow
> > > to get bits of supported offloads.
> >
> >
> > So we don't use dedicated ioctls in the past, instead, we just probing
> > by checking the return value of TUNSETOFFLOADS.
> >
> > E.g qemu has the following codes:
> >
> > int tap_probe_has_ufo(int fd)
> > {
> >  unsigned offload;
> >
> >  offload = TUN_F_CSUM | TUN_F_UFO;
> >
> >  if (ioctl(fd, TUNSETOFFLOAD, offload) < 0)
> >  return 0;
> >
> >  return 1;
> > }
> >
> > Any reason we can't keep using that?
> >
> > Thanks
> >
>
> Well, even in this example. To check the ufo feature, we trying to set it.
> What if we don't need to "enable" UFO and/or do not change its state?

So at least Qemu doesn't have such a requirement since during the
probe the virtual networking backend is not even started.

> I think it's a good idea to have the ability to get supported offloads
> without changing device behavior.

Do you see a real user for this?

Btw, we still need to probe this new ioctl anyway.

Thanks

>
> >
> > > Added 2 additional offlloads for USO(IPv4 & IPv6).
> > > Separate offloads are required for Windows VM guests,
> > > g.e. Windows may set USO rx only for IPv4.
> > >
> > > Signed-off-by: Andrew Melnychenko 
> > > ---
> > >   include/uapi/linux/if_tun.h | 3 +++
> > >   1 file changed, 3 insertions(+)
> > >
> > > diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h
> > > index 454ae31b93c7..07680fae6e18 100644
> > > --- a/include/uapi/linux/if_tun.h
> > > +++ b/include/uapi/linux/if_tun.h
> > > @@ -61,6 +61,7 @@
> > >   #define TUNSETFILTEREBPF _IOR('T', 225, int)
> > >   #define TUNSETCARRIER _IOW('T', 226, int)
> > >   #define TUNGETDEVNETNS _IO('T', 227)
> > > +#define TUNGETSUPPORTEDOFFLOADS _IOR('T', 228, unsigned int)
> > >
> > >   /* TUNSETIFF ifr flags */
> > >   #define IFF_TUN 0x0001
> > > @@ -88,6 +89,8 @@
> > >   #define TUN_F_TSO6  0x04/* I can handle TSO for IPv6 packets */
> > >   #define TUN_F_TSO_ECN   0x08/* I can handle TSO with ECN bits. 
> > > */
> > >   #define TUN_F_UFO   0x10/* I can handle UFO packets */
> > > +#define TUN_F_USO4   0x20/* I can handle USO for IPv4 packets */
> > > +#define TUN_F_USO6   0x40/* I can handle USO for IPv6 packets */
> > >
> > >   /* Protocol info prepended to the packets (when IFF_NO_PI is not set) */
> > >   #define TUN_PKT_STRIP   0x0001
> >
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] virtio_ring: aovid reading flag from the descriptor ring

2022-02-22 Thread Michael S. Tsirkin
On Wed, Feb 23, 2022 at 11:19:03AM +0800, Jason Wang wrote:
> On Mon, Nov 8, 2021 at 4:13 PM Jason Wang  wrote:
> >
> > Commit 72b5e8958738 ("virtio-ring: store DMA metadata in desc_extra
> > for split virtqueue") tries to make it possible for the driver to not
> > read from the descriptor ring to prevent the device from corrupting
> > the descriptor ring. But it still read the descriptor flag from the
> > descriptor ring during buffer detach.
> >
> > This patch fixes by always store the descriptor flag no matter whether
> > DMA API is used and then we can avoid reading descriptor flag from the
> > descriptor ring. This eliminates the possibly of unexpected next
> > descriptor caused by the wrong flag (e.g the next flag).
> >
> > Signed-off-by: Jason Wang 
> 
> Michael, any comment for this?
> 
> Thanks

I don't exactly see why we should care without DMA API, it seems
cleaner not to poke at the array one extra time.

> > ---
> >  drivers/virtio/virtio_ring.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 00f64f2f8b72..28734f4e57d3 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -583,7 +583,7 @@ static inline int virtqueue_add_split(struct virtqueue 
> > *_vq,
> > }
> > /* Last one doesn't continue. */
> > desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
> > -   if (!indirect && vq->use_dma_api)
> > +   if (!indirect)
> > vq->split.desc_extra[prev & (vq->split.vring.num - 
> > 1)].flags &=
> > ~VRING_DESC_F_NEXT;
> >
> > @@ -713,7 +713,7 @@ static void detach_buf_split(struct vring_virtqueue 
> > *vq, unsigned int head,
> > /* Put back on free list: unmap first-level descriptors and find 
> > end */
> > i = head;
> >
> > -   while (vq->split.vring.desc[i].flags & nextflag) {
> > +   while (vq->split.desc_extra[i].flags & nextflag) {
> > vring_unmap_one_split(vq, i);
> > i = vq->split.desc_extra[i].next;
> > vq->vq.num_free++;
> > --
> > 2.25.1
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio_ring: aovid reading flag from the descriptor ring

2022-02-22 Thread Jason Wang
On Wed, Feb 23, 2022 at 3:08 PM Michael S. Tsirkin  wrote:
>
> On Wed, Feb 23, 2022 at 11:19:03AM +0800, Jason Wang wrote:
> > On Mon, Nov 8, 2021 at 4:13 PM Jason Wang  wrote:
> > >
> > > Commit 72b5e8958738 ("virtio-ring: store DMA metadata in desc_extra
> > > for split virtqueue") tries to make it possible for the driver to not
> > > read from the descriptor ring to prevent the device from corrupting
> > > the descriptor ring. But it still read the descriptor flag from the
> > > descriptor ring during buffer detach.
> > >
> > > This patch fixes by always store the descriptor flag no matter whether
> > > DMA API is used and then we can avoid reading descriptor flag from the
> > > descriptor ring. This eliminates the possibly of unexpected next
> > > descriptor caused by the wrong flag (e.g the next flag).
> > >
> > > Signed-off-by: Jason Wang 
> >
> > Michael, any comment for this?
> >
> > Thanks
>
> I don't exactly see why we should care without DMA API, it seems
> cleaner not to poke at the array one extra time.

I think the answer is that we have any special care about the DMA API
for all other places that are using desc_extra.

Thanks


>
> > > ---
> > >  drivers/virtio/virtio_ring.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index 00f64f2f8b72..28734f4e57d3 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -583,7 +583,7 @@ static inline int virtqueue_add_split(struct 
> > > virtqueue *_vq,
> > > }
> > > /* Last one doesn't continue. */
> > > desc[prev].flags &= cpu_to_virtio16(_vq->vdev, 
> > > ~VRING_DESC_F_NEXT);
> > > -   if (!indirect && vq->use_dma_api)
> > > +   if (!indirect)
> > > vq->split.desc_extra[prev & (vq->split.vring.num - 
> > > 1)].flags &=
> > > ~VRING_DESC_F_NEXT;
> > >
> > > @@ -713,7 +713,7 @@ static void detach_buf_split(struct vring_virtqueue 
> > > *vq, unsigned int head,
> > > /* Put back on free list: unmap first-level descriptors and find 
> > > end */
> > > i = head;
> > >
> > > -   while (vq->split.vring.desc[i].flags & nextflag) {
> > > +   while (vq->split.desc_extra[i].flags & nextflag) {
> > > vring_unmap_one_split(vq, i);
> > > i = vq->split.desc_extra[i].next;
> > > vq->vq.num_free++;
> > > --
> > > 2.25.1
> > >
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio_ring: aovid reading flag from the descriptor ring

2022-02-22 Thread Jason Wang
On Wed, Feb 23, 2022 at 3:34 PM Jason Wang  wrote:
>
> On Wed, Feb 23, 2022 at 3:08 PM Michael S. Tsirkin  wrote:
> >
> > On Wed, Feb 23, 2022 at 11:19:03AM +0800, Jason Wang wrote:
> > > On Mon, Nov 8, 2021 at 4:13 PM Jason Wang  wrote:
> > > >
> > > > Commit 72b5e8958738 ("virtio-ring: store DMA metadata in desc_extra
> > > > for split virtqueue") tries to make it possible for the driver to not
> > > > read from the descriptor ring to prevent the device from corrupting
> > > > the descriptor ring. But it still read the descriptor flag from the
> > > > descriptor ring during buffer detach.
> > > >
> > > > This patch fixes by always store the descriptor flag no matter whether
> > > > DMA API is used and then we can avoid reading descriptor flag from the
> > > > descriptor ring. This eliminates the possibly of unexpected next
> > > > descriptor caused by the wrong flag (e.g the next flag).
> > > >
> > > > Signed-off-by: Jason Wang 
> > >
> > > Michael, any comment for this?
> > >
> > > Thanks
> >
> > I don't exactly see why we should care without DMA API, it seems
> > cleaner not to poke at the array one extra time.
>
> I think the answer is that we have any special care about the DMA API

I meant "we haven't had" actually.

Thanks

> for all other places that are using desc_extra.
>
> Thanks
>
>
> >
> > > > ---
> > > >  drivers/virtio/virtio_ring.c | 4 ++--
> > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > index 00f64f2f8b72..28734f4e57d3 100644
> > > > --- a/drivers/virtio/virtio_ring.c
> > > > +++ b/drivers/virtio/virtio_ring.c
> > > > @@ -583,7 +583,7 @@ static inline int virtqueue_add_split(struct 
> > > > virtqueue *_vq,
> > > > }
> > > > /* Last one doesn't continue. */
> > > > desc[prev].flags &= cpu_to_virtio16(_vq->vdev, 
> > > > ~VRING_DESC_F_NEXT);
> > > > -   if (!indirect && vq->use_dma_api)
> > > > +   if (!indirect)
> > > > vq->split.desc_extra[prev & (vq->split.vring.num - 
> > > > 1)].flags &=
> > > > ~VRING_DESC_F_NEXT;
> > > >
> > > > @@ -713,7 +713,7 @@ static void detach_buf_split(struct vring_virtqueue 
> > > > *vq, unsigned int head,
> > > > /* Put back on free list: unmap first-level descriptors and 
> > > > find end */
> > > > i = head;
> > > >
> > > > -   while (vq->split.vring.desc[i].flags & nextflag) {
> > > > +   while (vq->split.desc_extra[i].flags & nextflag) {
> > > > vring_unmap_one_split(vq, i);
> > > > i = vq->split.desc_extra[i].next;
> > > > vq->vq.num_free++;
> > > > --
> > > > 2.25.1
> > > >
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization