[PATCH] virtio_vdpa: build affinity masks conditionally

2023-08-11 Thread Jason Wang
We try to build affinity mask via create_affinity_masks()
unconditionally which may lead several issues:

- the affinity mask is not used for parent without affinity support
  (only VDUSE support the affinity now)
- the logic of create_affinity_masks() might not work for devices
  other than block. For example it's not rare in the networking device
  where the number of queues could exceed the number of CPUs. Such
  case breaks the current affinity logic which is based on
  group_cpus_evenly() who assumes the number of CPUs are not less than
  the number of groups. This can trigger a warning[1]:

if (ret >= 0)
WARN_ON(nr_present + nr_others < numgrps);

Fixing this by only build the affinity masks only when

- Driver passes affinity descriptor, driver like virtio-blk can make
  sure to limit the number of queues when it exceeds the number of CPUs
- Parent support affinity setting config ops

This help to avoid the warning. More optimizations could be done on
top.

[1]
[  682.146655] WARNING: CPU: 6 PID: 1550 at lib/group_cpus.c:400 
group_cpus_evenly+0x1aa/0x1c0
[  682.146668] CPU: 6 PID: 1550 Comm: vdpa Not tainted 6.5.0-rc5jason+ #79
[  682.146671] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
[  682.146673] RIP: 0010:group_cpus_evenly+0x1aa/0x1c0
[  682.146676] Code: 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 cc cc cc cc e8 1b c4 
74 ff 48 89 ef e8 13 ac 98 ff 4c 89 e7 45 31 e4 e8 08 ac 98 ff eb c2 <0f> 0b eb 
b6 e8 fd 05 c3 00 45 31 e4 eb e5 cc cc cc cc cc cc cc cc
[  682.146679] RSP: 0018:c9000215f498 EFLAGS: 00010293
[  682.146682] RAX: 0001f1e0 RBX: 0041 RCX: 
[  682.146684] RDX: 888109922058 RSI: 0041 RDI: 0030
[  682.146686] RBP: 888109922058 R08: c9000215f498 R09: c9000215f4a0
[  682.146687] R10: 000198d0 R11: 0030 R12: 888107e02800
[  682.146689] R13: 0030 R14: 0030 R15: 0041
[  682.146692] FS:  7fef52315740() GS:88823738() 
knlGS:
[  682.146695] CS:  0010 DS:  ES:  CR0: 80050033
[  682.146696] CR2: 7fef52509000 CR3: 000110dbc004 CR4: 00370ee0
[  682.146698] DR0:  DR1:  DR2: 
[  682.146700] DR3:  DR6: fffe0ff0 DR7: 0400
[  682.146701] Call Trace:
[  682.146703]  
[  682.146705]  ? __warn+0x7b/0x130
[  682.146709]  ? group_cpus_evenly+0x1aa/0x1c0
[  682.146712]  ? report_bug+0x1c8/0x1e0
[  682.146717]  ? handle_bug+0x3c/0x70
[  682.146721]  ? exc_invalid_op+0x14/0x70
[  682.146723]  ? asm_exc_invalid_op+0x16/0x20
[  682.146727]  ? group_cpus_evenly+0x1aa/0x1c0
[  682.146729]  ? group_cpus_evenly+0x15c/0x1c0
[  682.146731]  create_affinity_masks+0xaf/0x1a0
[  682.146735]  virtio_vdpa_find_vqs+0x83/0x1d0
[  682.146738]  ? __pfx_default_calc_sets+0x10/0x10
[  682.146742]  virtnet_find_vqs+0x1f0/0x370
[  682.146747]  virtnet_probe+0x501/0xcd0
[  682.146749]  ? vp_modern_get_status+0x12/0x20
[  682.146751]  ? get_cap_addr.isra.0+0x10/0xc0
[  682.146754]  virtio_dev_probe+0x1af/0x260
[  682.146759]  really_probe+0x1a5/0x410

Fixes: 3dad56823b53 ("virtio-vdpa: Support interrupt affinity spreading 
mechanism")
Signed-off-by: Jason Wang 
---
 drivers/virtio/virtio_vdpa.c | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c
index 961161da5900..06ce6d8c2e00 100644
--- a/drivers/virtio/virtio_vdpa.c
+++ b/drivers/virtio/virtio_vdpa.c
@@ -366,11 +366,14 @@ static int virtio_vdpa_find_vqs(struct virtio_device 
*vdev, unsigned int nvqs,
struct irq_affinity default_affd = { 0 };
struct cpumask *masks;
struct vdpa_callback cb;
+   bool has_affinity = desc && ops->set_vq_affinity;
int i, err, queue_idx = 0;
 
-   masks = create_affinity_masks(nvqs, desc ? desc : &default_affd);
-   if (!masks)
-   return -ENOMEM;
+   if (has_affinity) {
+   masks = create_affinity_masks(nvqs, desc ? desc : 
&default_affd);
+   if (!masks)
+   return -ENOMEM;
+   }
 
for (i = 0; i < nvqs; ++i) {
if (!names[i]) {
@@ -386,20 +389,22 @@ static int virtio_vdpa_find_vqs(struct virtio_device 
*vdev, unsigned int nvqs,
goto err_setup_vq;
}
 
-   if (ops->set_vq_affinity)
+   if (has_affinity)
ops->set_vq_affinity(vdpa, i, &masks[i]);
}
 
cb.callback = virtio_vdpa_config_cb;
cb.private = vd_dev;
ops->set_config_cb(vdpa, &cb);
-   kfree(masks);
+   if (has_affinity)
+   kfree(masks);
 
return 0;
 
 err_setup_vq:
virtio_vdpa_del_vqs(vdev);
-   kfree(masks);
+   if (has_affinity)
+  

Re: [PATCH net-next v4 2/2] virtio-net: add cond_resched() to the command waiting loop

2023-08-11 Thread Jason Wang
On Fri, Aug 11, 2023 at 1:42 PM Michael S. Tsirkin  wrote:
>
> On Fri, Aug 11, 2023 at 10:23:15AM +0800, Jason Wang wrote:
> > On Fri, Aug 11, 2023 at 3:41 AM Michael S. Tsirkin  wrote:
> > >
> > > On Tue, Aug 08, 2023 at 10:30:56AM +0800, Jason Wang wrote:
> > > > On Mon, Jul 31, 2023 at 2:30 PM Jason Wang  wrote:
> > > > >
> > > > > On Thu, Jul 27, 2023 at 5:46 PM Michael S. Tsirkin  
> > > > > wrote:
> > > > > >
> > > > > > On Thu, Jul 27, 2023 at 04:59:33PM +0800, Jason Wang wrote:
> > > > > > > > They really shouldn't - any NIC that takes forever to
> > > > > > > > program will create issues in the networking stack.
> > > > > > >
> > > > > > > Unfortunately, it's not rare as the device/cvq could be 
> > > > > > > implemented
> > > > > > > via firmware or software.
> > > > > >
> > > > > > Currently that mean one either has sane firmware with a scheduler 
> > > > > > that
> > > > > > can meet deadlines, or loses ability to report errors back.
> > > > > >
> > > > > > > > But if they do they can always set this flag too.
> > > > > > >
> > > > > > > This may have false negatives and may confuse the management.
> > > > > > >
> > > > > > > Maybe we can extend the networking core to allow some device 
> > > > > > > specific
> > > > > > > configurations to be done with device specific lock without rtnl. 
> > > > > > > For
> > > > > > > example, split the set_channels to
> > > > > > >
> > > > > > > pre_set_channels
> > > > > > > set_channels
> > > > > > > post_set_channels
> > > > > > >
> > > > > > > The device specific part could be done in pre and post without a 
> > > > > > > rtnl lock?
> > > > > > >
> > > > > > > Thanks
> > > > > >
> > > > > >
> > > > > > Would the benefit be that errors can be reported to userspace then?
> > > > > > Then maybe.  I think you will have to show how this works for at 
> > > > > > least
> > > > > > one card besides virtio.
> > > > >
> > > > > Even for virtio, this seems not easy, as e.g the
> > > > > virtnet_send_command() and netif_set_real_num_tx_queues() need to
> > > > > appear to be atomic to the networking core.
> > > > >
> > > > > I wonder if we can re-consider the way of a timeout here and choose a
> > > > > sane value as a start.
> > > >
> > > > Michael, any more input on this?
> > > >
> > > > Thanks
> > >
> > > I think this is just mission creep. We are trying to fix
> > > vduse - let's do that for starters.
> > >
> > > Recovering from firmware timeouts is far from trivial and
> > > just assuming that just because it timed out it will not
> > > access memory is just as likely to cause memory corruption
> > > with worse results than an infinite spin.
> >
> > Yes, this might require support not only in the driver
> >
> > >
> > > I propose we fix this for vduse and assume hardware/firmware
> > > is well behaved.
> >
> > One major case is the re-connection, in that case it might take
> > whatever longer that the kernel virito-net driver expects.
> > So we can have a timeout in VDUSE and trap CVQ then VDUSE can return
> > and fail early?
>
> Ugh more mission creep. not at all my point. vduse should cache
> values in the driver,

What do you mean by values here? The cvq command?

Thanks

> until someone manages to change
> net core to be more friendly to userspace devices.
>
> >
> > > Or maybe not well behaved firmware will
> > > set the flag losing error reporting ability.
> >
> > This might be hard since it means not only the set but also the get is
> > unreliable.
> >
> > Thanks
>
> /me shrugs
>
>
>
> > >
> > >
> > >
> > > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > MST
> > > > > >
> > >
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next v4 2/2] virtio-net: add cond_resched() to the command waiting loop

2023-08-11 Thread Michael S. Tsirkin
On Fri, Aug 11, 2023 at 05:18:51PM +0800, Jason Wang wrote:
> On Fri, Aug 11, 2023 at 1:42 PM Michael S. Tsirkin  wrote:
> >
> > On Fri, Aug 11, 2023 at 10:23:15AM +0800, Jason Wang wrote:
> > > On Fri, Aug 11, 2023 at 3:41 AM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Tue, Aug 08, 2023 at 10:30:56AM +0800, Jason Wang wrote:
> > > > > On Mon, Jul 31, 2023 at 2:30 PM Jason Wang  
> > > > > wrote:
> > > > > >
> > > > > > On Thu, Jul 27, 2023 at 5:46 PM Michael S. Tsirkin 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Thu, Jul 27, 2023 at 04:59:33PM +0800, Jason Wang wrote:
> > > > > > > > > They really shouldn't - any NIC that takes forever to
> > > > > > > > > program will create issues in the networking stack.
> > > > > > > >
> > > > > > > > Unfortunately, it's not rare as the device/cvq could be 
> > > > > > > > implemented
> > > > > > > > via firmware or software.
> > > > > > >
> > > > > > > Currently that mean one either has sane firmware with a scheduler 
> > > > > > > that
> > > > > > > can meet deadlines, or loses ability to report errors back.
> > > > > > >
> > > > > > > > > But if they do they can always set this flag too.
> > > > > > > >
> > > > > > > > This may have false negatives and may confuse the management.
> > > > > > > >
> > > > > > > > Maybe we can extend the networking core to allow some device 
> > > > > > > > specific
> > > > > > > > configurations to be done with device specific lock without 
> > > > > > > > rtnl. For
> > > > > > > > example, split the set_channels to
> > > > > > > >
> > > > > > > > pre_set_channels
> > > > > > > > set_channels
> > > > > > > > post_set_channels
> > > > > > > >
> > > > > > > > The device specific part could be done in pre and post without 
> > > > > > > > a rtnl lock?
> > > > > > > >
> > > > > > > > Thanks
> > > > > > >
> > > > > > >
> > > > > > > Would the benefit be that errors can be reported to userspace 
> > > > > > > then?
> > > > > > > Then maybe.  I think you will have to show how this works for at 
> > > > > > > least
> > > > > > > one card besides virtio.
> > > > > >
> > > > > > Even for virtio, this seems not easy, as e.g the
> > > > > > virtnet_send_command() and netif_set_real_num_tx_queues() need to
> > > > > > appear to be atomic to the networking core.
> > > > > >
> > > > > > I wonder if we can re-consider the way of a timeout here and choose 
> > > > > > a
> > > > > > sane value as a start.
> > > > >
> > > > > Michael, any more input on this?
> > > > >
> > > > > Thanks
> > > >
> > > > I think this is just mission creep. We are trying to fix
> > > > vduse - let's do that for starters.
> > > >
> > > > Recovering from firmware timeouts is far from trivial and
> > > > just assuming that just because it timed out it will not
> > > > access memory is just as likely to cause memory corruption
> > > > with worse results than an infinite spin.
> > >
> > > Yes, this might require support not only in the driver
> > >
> > > >
> > > > I propose we fix this for vduse and assume hardware/firmware
> > > > is well behaved.
> > >
> > > One major case is the re-connection, in that case it might take
> > > whatever longer that the kernel virito-net driver expects.
> > > So we can have a timeout in VDUSE and trap CVQ then VDUSE can return
> > > and fail early?
> >
> > Ugh more mission creep. not at all my point. vduse should cache
> > values in the driver,
> 
> What do you mean by values here? The cvq command?
> 
> Thanks

The card status generally.

> > until someone manages to change
> > net core to be more friendly to userspace devices.
> >
> > >
> > > > Or maybe not well behaved firmware will
> > > > set the flag losing error reporting ability.
> > >
> > > This might be hard since it means not only the set but also the get is
> > > unreliable.
> > >
> > > Thanks
> >
> > /me shrugs
> >
> >
> >
> > > >
> > > >
> > > >
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > MST
> > > > > > >
> > > >
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] virtio_vdpa: build affinity masks conditionally

2023-08-11 Thread Michael S. Tsirkin
On Fri, Aug 11, 2023 at 05:15:39AM -0400, Jason Wang wrote:
> We try to build affinity mask via create_affinity_masks()
> unconditionally which may lead several issues:
> 
> - the affinity mask is not used for parent without affinity support
>   (only VDUSE support the affinity now)
> - the logic of create_affinity_masks() might not work for devices
>   other than block. For example it's not rare in the networking device
>   where the number of queues could exceed the number of CPUs. Such
>   case breaks the current affinity logic which is based on
>   group_cpus_evenly() who assumes the number of CPUs are not less than
>   the number of groups. This can trigger a warning[1]:
> 
>   if (ret >= 0)
>   WARN_ON(nr_present + nr_others < numgrps);
> 
> Fixing this by only build the affinity masks only when
> 
> - Driver passes affinity descriptor, driver like virtio-blk can make
>   sure to limit the number of queues when it exceeds the number of CPUs
> - Parent support affinity setting config ops
> 
> This help to avoid the warning. More optimizations could be done on
> top.
> 
> [1]
> [  682.146655] WARNING: CPU: 6 PID: 1550 at lib/group_cpus.c:400 
> group_cpus_evenly+0x1aa/0x1c0
> [  682.146668] CPU: 6 PID: 1550 Comm: vdpa Not tainted 6.5.0-rc5jason+ #79
> [  682.146671] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
> [  682.146673] RIP: 0010:group_cpus_evenly+0x1aa/0x1c0
> [  682.146676] Code: 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 cc cc cc cc e8 1b c4 
> 74 ff 48 89 ef e8 13 ac 98 ff 4c 89 e7 45 31 e4 e8 08 ac 98 ff eb c2 <0f> 0b 
> eb b6 e8 fd 05 c3 00 45 31 e4 eb e5 cc cc cc cc cc cc cc cc
> [  682.146679] RSP: 0018:c9000215f498 EFLAGS: 00010293
> [  682.146682] RAX: 0001f1e0 RBX: 0041 RCX: 
> 
> [  682.146684] RDX: 888109922058 RSI: 0041 RDI: 
> 0030
> [  682.146686] RBP: 888109922058 R08: c9000215f498 R09: 
> c9000215f4a0
> [  682.146687] R10: 000198d0 R11: 0030 R12: 
> 888107e02800
> [  682.146689] R13: 0030 R14: 0030 R15: 
> 0041
> [  682.146692] FS:  7fef52315740() GS:88823738() 
> knlGS:
> [  682.146695] CS:  0010 DS:  ES:  CR0: 80050033
> [  682.146696] CR2: 7fef52509000 CR3: 000110dbc004 CR4: 
> 00370ee0
> [  682.146698] DR0:  DR1:  DR2: 
> 
> [  682.146700] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> [  682.146701] Call Trace:
> [  682.146703]  
> [  682.146705]  ? __warn+0x7b/0x130
> [  682.146709]  ? group_cpus_evenly+0x1aa/0x1c0
> [  682.146712]  ? report_bug+0x1c8/0x1e0
> [  682.146717]  ? handle_bug+0x3c/0x70
> [  682.146721]  ? exc_invalid_op+0x14/0x70
> [  682.146723]  ? asm_exc_invalid_op+0x16/0x20
> [  682.146727]  ? group_cpus_evenly+0x1aa/0x1c0
> [  682.146729]  ? group_cpus_evenly+0x15c/0x1c0
> [  682.146731]  create_affinity_masks+0xaf/0x1a0
> [  682.146735]  virtio_vdpa_find_vqs+0x83/0x1d0
> [  682.146738]  ? __pfx_default_calc_sets+0x10/0x10
> [  682.146742]  virtnet_find_vqs+0x1f0/0x370
> [  682.146747]  virtnet_probe+0x501/0xcd0
> [  682.146749]  ? vp_modern_get_status+0x12/0x20
> [  682.146751]  ? get_cap_addr.isra.0+0x10/0xc0
> [  682.146754]  virtio_dev_probe+0x1af/0x260
> [  682.146759]  really_probe+0x1a5/0x410
> 
> Fixes: 3dad56823b53 ("virtio-vdpa: Support interrupt affinity spreading 
> mechanism")
> Signed-off-by: Jason Wang 

This won't fix the case where block has more queues than CPUs though,
will it?

> ---
>  drivers/virtio/virtio_vdpa.c | 17 +++--
>  1 file changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c
> index 961161da5900..06ce6d8c2e00 100644
> --- a/drivers/virtio/virtio_vdpa.c
> +++ b/drivers/virtio/virtio_vdpa.c
> @@ -366,11 +366,14 @@ static int virtio_vdpa_find_vqs(struct virtio_device 
> *vdev, unsigned int nvqs,
>   struct irq_affinity default_affd = { 0 };
>   struct cpumask *masks;
>   struct vdpa_callback cb;
> + bool has_affinity = desc && ops->set_vq_affinity;
>   int i, err, queue_idx = 0;
>  
> - masks = create_affinity_masks(nvqs, desc ? desc : &default_affd);
> - if (!masks)
> - return -ENOMEM;
> + if (has_affinity) {
> + masks = create_affinity_masks(nvqs, desc ? desc : 
> &default_affd);
> + if (!masks)
> + return -ENOMEM;
> + }
>  
>   for (i = 0; i < nvqs; ++i) {
>   if (!names[i]) {
> @@ -386,20 +389,22 @@ static int virtio_vdpa_find_vqs(struct virtio_device 
> *vdev, unsigned int nvqs,
>   goto err_setup_vq;
>   }
>  
> - if (ops->set_vq_affinity)
> + if (has_affinity)
>   ops->set_vq_affinity(vdpa, i, &masks[i]);

Re: [PATCH] virtio_vdpa: build affinity masks conditionally

2023-08-11 Thread Jason Wang
On Fri, Aug 11, 2023 at 5:25 PM Michael S. Tsirkin  wrote:
>
> On Fri, Aug 11, 2023 at 05:15:39AM -0400, Jason Wang wrote:
> > We try to build affinity mask via create_affinity_masks()
> > unconditionally which may lead several issues:
> >
> > - the affinity mask is not used for parent without affinity support
> >   (only VDUSE support the affinity now)
> > - the logic of create_affinity_masks() might not work for devices
> >   other than block. For example it's not rare in the networking device
> >   where the number of queues could exceed the number of CPUs. Such
> >   case breaks the current affinity logic which is based on
> >   group_cpus_evenly() who assumes the number of CPUs are not less than
> >   the number of groups. This can trigger a warning[1]:
> >
> >   if (ret >= 0)
> >   WARN_ON(nr_present + nr_others < numgrps);
> >
> > Fixing this by only build the affinity masks only when
> >
> > - Driver passes affinity descriptor, driver like virtio-blk can make
> >   sure to limit the number of queues when it exceeds the number of CPUs
> > - Parent support affinity setting config ops
> >
> > This help to avoid the warning. More optimizations could be done on
> > top.
> >
> > [1]
> > [  682.146655] WARNING: CPU: 6 PID: 1550 at lib/group_cpus.c:400 
> > group_cpus_evenly+0x1aa/0x1c0
> > [  682.146668] CPU: 6 PID: 1550 Comm: vdpa Not tainted 6.5.0-rc5jason+ #79
> > [  682.146671] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> > rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
> > [  682.146673] RIP: 0010:group_cpus_evenly+0x1aa/0x1c0
> > [  682.146676] Code: 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 cc cc cc cc e8 1b 
> > c4 74 ff 48 89 ef e8 13 ac 98 ff 4c 89 e7 45 31 e4 e8 08 ac 98 ff eb c2 
> > <0f> 0b eb b6 e8 fd 05 c3 00 45 31 e4 eb e5 cc cc cc cc cc cc cc cc
> > [  682.146679] RSP: 0018:c9000215f498 EFLAGS: 00010293
> > [  682.146682] RAX: 0001f1e0 RBX: 0041 RCX: 
> > 
> > [  682.146684] RDX: 888109922058 RSI: 0041 RDI: 
> > 0030
> > [  682.146686] RBP: 888109922058 R08: c9000215f498 R09: 
> > c9000215f4a0
> > [  682.146687] R10: 000198d0 R11: 0030 R12: 
> > 888107e02800
> > [  682.146689] R13: 0030 R14: 0030 R15: 
> > 0041
> > [  682.146692] FS:  7fef52315740() GS:88823738() 
> > knlGS:
> > [  682.146695] CS:  0010 DS:  ES:  CR0: 80050033
> > [  682.146696] CR2: 7fef52509000 CR3: 000110dbc004 CR4: 
> > 00370ee0
> > [  682.146698] DR0:  DR1:  DR2: 
> > 
> > [  682.146700] DR3:  DR6: fffe0ff0 DR7: 
> > 0400
> > [  682.146701] Call Trace:
> > [  682.146703]  
> > [  682.146705]  ? __warn+0x7b/0x130
> > [  682.146709]  ? group_cpus_evenly+0x1aa/0x1c0
> > [  682.146712]  ? report_bug+0x1c8/0x1e0
> > [  682.146717]  ? handle_bug+0x3c/0x70
> > [  682.146721]  ? exc_invalid_op+0x14/0x70
> > [  682.146723]  ? asm_exc_invalid_op+0x16/0x20
> > [  682.146727]  ? group_cpus_evenly+0x1aa/0x1c0
> > [  682.146729]  ? group_cpus_evenly+0x15c/0x1c0
> > [  682.146731]  create_affinity_masks+0xaf/0x1a0
> > [  682.146735]  virtio_vdpa_find_vqs+0x83/0x1d0
> > [  682.146738]  ? __pfx_default_calc_sets+0x10/0x10
> > [  682.146742]  virtnet_find_vqs+0x1f0/0x370
> > [  682.146747]  virtnet_probe+0x501/0xcd0
> > [  682.146749]  ? vp_modern_get_status+0x12/0x20
> > [  682.146751]  ? get_cap_addr.isra.0+0x10/0xc0
> > [  682.146754]  virtio_dev_probe+0x1af/0x260
> > [  682.146759]  really_probe+0x1a5/0x410
> >
> > Fixes: 3dad56823b53 ("virtio-vdpa: Support interrupt affinity spreading 
> > mechanism")
> > Signed-off-by: Jason Wang 
>
> This won't fix the case where block has more queues than CPUs though,
> will it?

Block will limit the number of queues during init_vq():

num_vqs = min_t(unsigned int,
min_not_zero(num_request_queues, nr_cpu_ids),
num_vqs);


Thanks

>
> > ---
> >  drivers/virtio/virtio_vdpa.c | 17 +++--
> >  1 file changed, 11 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c
> > index 961161da5900..06ce6d8c2e00 100644
> > --- a/drivers/virtio/virtio_vdpa.c
> > +++ b/drivers/virtio/virtio_vdpa.c
> > @@ -366,11 +366,14 @@ static int virtio_vdpa_find_vqs(struct virtio_device 
> > *vdev, unsigned int nvqs,
> >   struct irq_affinity default_affd = { 0 };
> >   struct cpumask *masks;
> >   struct vdpa_callback cb;
> > + bool has_affinity = desc && ops->set_vq_affinity;
> >   int i, err, queue_idx = 0;
> >
> > - masks = create_affinity_masks(nvqs, desc ? desc : &default_affd);
> > - if (!masks)
> > - return -ENOMEM;
> > + if (has_affinity) {
> > + masks = create_affinity_masks(nvqs, desc ? desc : 
> > &default_affd

Re: [PATCH net-next v4 2/2] virtio-net: add cond_resched() to the command waiting loop

2023-08-11 Thread Jason Wang
On Fri, Aug 11, 2023 at 5:21 PM Michael S. Tsirkin  wrote:
>
> On Fri, Aug 11, 2023 at 05:18:51PM +0800, Jason Wang wrote:
> > On Fri, Aug 11, 2023 at 1:42 PM Michael S. Tsirkin  wrote:
> > >
> > > On Fri, Aug 11, 2023 at 10:23:15AM +0800, Jason Wang wrote:
> > > > On Fri, Aug 11, 2023 at 3:41 AM Michael S. Tsirkin  
> > > > wrote:
> > > > >
> > > > > On Tue, Aug 08, 2023 at 10:30:56AM +0800, Jason Wang wrote:
> > > > > > On Mon, Jul 31, 2023 at 2:30 PM Jason Wang  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Thu, Jul 27, 2023 at 5:46 PM Michael S. Tsirkin 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Thu, Jul 27, 2023 at 04:59:33PM +0800, Jason Wang wrote:
> > > > > > > > > > They really shouldn't - any NIC that takes forever to
> > > > > > > > > > program will create issues in the networking stack.
> > > > > > > > >
> > > > > > > > > Unfortunately, it's not rare as the device/cvq could be 
> > > > > > > > > implemented
> > > > > > > > > via firmware or software.
> > > > > > > >
> > > > > > > > Currently that mean one either has sane firmware with a 
> > > > > > > > scheduler that
> > > > > > > > can meet deadlines, or loses ability to report errors back.
> > > > > > > >
> > > > > > > > > > But if they do they can always set this flag too.
> > > > > > > > >
> > > > > > > > > This may have false negatives and may confuse the management.
> > > > > > > > >
> > > > > > > > > Maybe we can extend the networking core to allow some device 
> > > > > > > > > specific
> > > > > > > > > configurations to be done with device specific lock without 
> > > > > > > > > rtnl. For
> > > > > > > > > example, split the set_channels to
> > > > > > > > >
> > > > > > > > > pre_set_channels
> > > > > > > > > set_channels
> > > > > > > > > post_set_channels
> > > > > > > > >
> > > > > > > > > The device specific part could be done in pre and post 
> > > > > > > > > without a rtnl lock?
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > >
> > > > > > > >
> > > > > > > > Would the benefit be that errors can be reported to userspace 
> > > > > > > > then?
> > > > > > > > Then maybe.  I think you will have to show how this works for 
> > > > > > > > at least
> > > > > > > > one card besides virtio.
> > > > > > >
> > > > > > > Even for virtio, this seems not easy, as e.g the
> > > > > > > virtnet_send_command() and netif_set_real_num_tx_queues() need to
> > > > > > > appear to be atomic to the networking core.
> > > > > > >
> > > > > > > I wonder if we can re-consider the way of a timeout here and 
> > > > > > > choose a
> > > > > > > sane value as a start.
> > > > > >
> > > > > > Michael, any more input on this?
> > > > > >
> > > > > > Thanks
> > > > >
> > > > > I think this is just mission creep. We are trying to fix
> > > > > vduse - let's do that for starters.
> > > > >
> > > > > Recovering from firmware timeouts is far from trivial and
> > > > > just assuming that just because it timed out it will not
> > > > > access memory is just as likely to cause memory corruption
> > > > > with worse results than an infinite spin.
> > > >
> > > > Yes, this might require support not only in the driver
> > > >
> > > > >
> > > > > I propose we fix this for vduse and assume hardware/firmware
> > > > > is well behaved.
> > > >
> > > > One major case is the re-connection, in that case it might take
> > > > whatever longer that the kernel virito-net driver expects.
> > > > So we can have a timeout in VDUSE and trap CVQ then VDUSE can return
> > > > and fail early?
> > >
> > > Ugh more mission creep. not at all my point. vduse should cache
> > > values in the driver,
> >
> > What do you mean by values here? The cvq command?
> >
> > Thanks
>
> The card status generally.

Just to make sure I understand here. The CVQ needs to be processed by
the userspace now. How could we cache the status?

Thanks

>
> > > until someone manages to change
> > > net core to be more friendly to userspace devices.
> > >
> > > >
> > > > > Or maybe not well behaved firmware will
> > > > > set the flag losing error reporting ability.
> > > >
> > > > This might be hard since it means not only the set but also the get is
> > > > unreliable.
> > > >
> > > > Thanks
> > >
> > > /me shrugs
> > >
> > >
> > >
> > > > >
> > > > >
> > > > >
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > MST
> > > > > > > >
> > > > >
> > >
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] virtio_vdpa: build affinity masks conditionally

2023-08-11 Thread Michael S. Tsirkin
On Fri, Aug 11, 2023 at 05:41:44PM +0800, Jason Wang wrote:
> On Fri, Aug 11, 2023 at 5:25 PM Michael S. Tsirkin  wrote:
> >
> > On Fri, Aug 11, 2023 at 05:15:39AM -0400, Jason Wang wrote:
> > > We try to build affinity mask via create_affinity_masks()
> > > unconditionally which may lead several issues:
> > >
> > > - the affinity mask is not used for parent without affinity support
> > >   (only VDUSE support the affinity now)
> > > - the logic of create_affinity_masks() might not work for devices
> > >   other than block. For example it's not rare in the networking device
> > >   where the number of queues could exceed the number of CPUs. Such
> > >   case breaks the current affinity logic which is based on
> > >   group_cpus_evenly() who assumes the number of CPUs are not less than
> > >   the number of groups. This can trigger a warning[1]:
> > >
> > >   if (ret >= 0)
> > >   WARN_ON(nr_present + nr_others < numgrps);
> > >
> > > Fixing this by only build the affinity masks only when
> > >
> > > - Driver passes affinity descriptor, driver like virtio-blk can make
> > >   sure to limit the number of queues when it exceeds the number of CPUs
> > > - Parent support affinity setting config ops
> > >
> > > This help to avoid the warning. More optimizations could be done on
> > > top.
> > >
> > > [1]
> > > [  682.146655] WARNING: CPU: 6 PID: 1550 at lib/group_cpus.c:400 
> > > group_cpus_evenly+0x1aa/0x1c0
> > > [  682.146668] CPU: 6 PID: 1550 Comm: vdpa Not tainted 6.5.0-rc5jason+ #79
> > > [  682.146671] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
> > > BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
> > > [  682.146673] RIP: 0010:group_cpus_evenly+0x1aa/0x1c0
> > > [  682.146676] Code: 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 cc cc cc cc e8 
> > > 1b c4 74 ff 48 89 ef e8 13 ac 98 ff 4c 89 e7 45 31 e4 e8 08 ac 98 ff eb 
> > > c2 <0f> 0b eb b6 e8 fd 05 c3 00 45 31 e4 eb e5 cc cc cc cc cc cc cc cc
> > > [  682.146679] RSP: 0018:c9000215f498 EFLAGS: 00010293
> > > [  682.146682] RAX: 0001f1e0 RBX: 0041 RCX: 
> > > 
> > > [  682.146684] RDX: 888109922058 RSI: 0041 RDI: 
> > > 0030
> > > [  682.146686] RBP: 888109922058 R08: c9000215f498 R09: 
> > > c9000215f4a0
> > > [  682.146687] R10: 000198d0 R11: 0030 R12: 
> > > 888107e02800
> > > [  682.146689] R13: 0030 R14: 0030 R15: 
> > > 0041
> > > [  682.146692] FS:  7fef52315740() GS:88823738() 
> > > knlGS:
> > > [  682.146695] CS:  0010 DS:  ES:  CR0: 80050033
> > > [  682.146696] CR2: 7fef52509000 CR3: 000110dbc004 CR4: 
> > > 00370ee0
> > > [  682.146698] DR0:  DR1:  DR2: 
> > > 
> > > [  682.146700] DR3:  DR6: fffe0ff0 DR7: 
> > > 0400
> > > [  682.146701] Call Trace:
> > > [  682.146703]  
> > > [  682.146705]  ? __warn+0x7b/0x130
> > > [  682.146709]  ? group_cpus_evenly+0x1aa/0x1c0
> > > [  682.146712]  ? report_bug+0x1c8/0x1e0
> > > [  682.146717]  ? handle_bug+0x3c/0x70
> > > [  682.146721]  ? exc_invalid_op+0x14/0x70
> > > [  682.146723]  ? asm_exc_invalid_op+0x16/0x20
> > > [  682.146727]  ? group_cpus_evenly+0x1aa/0x1c0
> > > [  682.146729]  ? group_cpus_evenly+0x15c/0x1c0
> > > [  682.146731]  create_affinity_masks+0xaf/0x1a0
> > > [  682.146735]  virtio_vdpa_find_vqs+0x83/0x1d0
> > > [  682.146738]  ? __pfx_default_calc_sets+0x10/0x10
> > > [  682.146742]  virtnet_find_vqs+0x1f0/0x370
> > > [  682.146747]  virtnet_probe+0x501/0xcd0
> > > [  682.146749]  ? vp_modern_get_status+0x12/0x20
> > > [  682.146751]  ? get_cap_addr.isra.0+0x10/0xc0
> > > [  682.146754]  virtio_dev_probe+0x1af/0x260
> > > [  682.146759]  really_probe+0x1a5/0x410
> > >
> > > Fixes: 3dad56823b53 ("virtio-vdpa: Support interrupt affinity spreading 
> > > mechanism")
> > > Signed-off-by: Jason Wang 
> >
> > This won't fix the case where block has more queues than CPUs though,
> > will it?
> 
> Block will limit the number of queues during init_vq():
> 
> num_vqs = min_t(unsigned int,
> min_not_zero(num_request_queues, nr_cpu_ids),
> num_vqs);
> 
> 
> Thanks

Good point. This doesn't play well with cpu hotplug but that is not new.

> >
> > > ---
> > >  drivers/virtio/virtio_vdpa.c | 17 +++--
> > >  1 file changed, 11 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c
> > > index 961161da5900..06ce6d8c2e00 100644
> > > --- a/drivers/virtio/virtio_vdpa.c
> > > +++ b/drivers/virtio/virtio_vdpa.c
> > > @@ -366,11 +366,14 @@ static int virtio_vdpa_find_vqs(struct 
> > > virtio_device *vdev, unsigned int nvqs,
> > >   struct irq_affinity default_affd = { 0 };
> > >   struct cpumask *masks;
> > >   struct vdpa_callback cb;
> >

Re: [PATCH net-next v4 2/2] virtio-net: add cond_resched() to the command waiting loop

2023-08-11 Thread Michael S. Tsirkin
On Fri, Aug 11, 2023 at 05:43:25PM +0800, Jason Wang wrote:
> On Fri, Aug 11, 2023 at 5:21 PM Michael S. Tsirkin  wrote:
> >
> > On Fri, Aug 11, 2023 at 05:18:51PM +0800, Jason Wang wrote:
> > > On Fri, Aug 11, 2023 at 1:42 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Fri, Aug 11, 2023 at 10:23:15AM +0800, Jason Wang wrote:
> > > > > On Fri, Aug 11, 2023 at 3:41 AM Michael S. Tsirkin  
> > > > > wrote:
> > > > > >
> > > > > > On Tue, Aug 08, 2023 at 10:30:56AM +0800, Jason Wang wrote:
> > > > > > > On Mon, Jul 31, 2023 at 2:30 PM Jason Wang  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Thu, Jul 27, 2023 at 5:46 PM Michael S. Tsirkin 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Thu, Jul 27, 2023 at 04:59:33PM +0800, Jason Wang wrote:
> > > > > > > > > > > They really shouldn't - any NIC that takes forever to
> > > > > > > > > > > program will create issues in the networking stack.
> > > > > > > > > >
> > > > > > > > > > Unfortunately, it's not rare as the device/cvq could be 
> > > > > > > > > > implemented
> > > > > > > > > > via firmware or software.
> > > > > > > > >
> > > > > > > > > Currently that mean one either has sane firmware with a 
> > > > > > > > > scheduler that
> > > > > > > > > can meet deadlines, or loses ability to report errors back.
> > > > > > > > >
> > > > > > > > > > > But if they do they can always set this flag too.
> > > > > > > > > >
> > > > > > > > > > This may have false negatives and may confuse the 
> > > > > > > > > > management.
> > > > > > > > > >
> > > > > > > > > > Maybe we can extend the networking core to allow some 
> > > > > > > > > > device specific
> > > > > > > > > > configurations to be done with device specific lock without 
> > > > > > > > > > rtnl. For
> > > > > > > > > > example, split the set_channels to
> > > > > > > > > >
> > > > > > > > > > pre_set_channels
> > > > > > > > > > set_channels
> > > > > > > > > > post_set_channels
> > > > > > > > > >
> > > > > > > > > > The device specific part could be done in pre and post 
> > > > > > > > > > without a rtnl lock?
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Would the benefit be that errors can be reported to userspace 
> > > > > > > > > then?
> > > > > > > > > Then maybe.  I think you will have to show how this works for 
> > > > > > > > > at least
> > > > > > > > > one card besides virtio.
> > > > > > > >
> > > > > > > > Even for virtio, this seems not easy, as e.g the
> > > > > > > > virtnet_send_command() and netif_set_real_num_tx_queues() need 
> > > > > > > > to
> > > > > > > > appear to be atomic to the networking core.
> > > > > > > >
> > > > > > > > I wonder if we can re-consider the way of a timeout here and 
> > > > > > > > choose a
> > > > > > > > sane value as a start.
> > > > > > >
> > > > > > > Michael, any more input on this?
> > > > > > >
> > > > > > > Thanks
> > > > > >
> > > > > > I think this is just mission creep. We are trying to fix
> > > > > > vduse - let's do that for starters.
> > > > > >
> > > > > > Recovering from firmware timeouts is far from trivial and
> > > > > > just assuming that just because it timed out it will not
> > > > > > access memory is just as likely to cause memory corruption
> > > > > > with worse results than an infinite spin.
> > > > >
> > > > > Yes, this might require support not only in the driver
> > > > >
> > > > > >
> > > > > > I propose we fix this for vduse and assume hardware/firmware
> > > > > > is well behaved.
> > > > >
> > > > > One major case is the re-connection, in that case it might take
> > > > > whatever longer that the kernel virito-net driver expects.
> > > > > So we can have a timeout in VDUSE and trap CVQ then VDUSE can return
> > > > > and fail early?
> > > >
> > > > Ugh more mission creep. not at all my point. vduse should cache
> > > > values in the driver,
> > >
> > > What do you mean by values here? The cvq command?
> > >
> > > Thanks
> >
> > The card status generally.
> 
> Just to make sure I understand here. The CVQ needs to be processed by
> the userspace now. How could we cache the status?
> 
> Thanks

vduse will have to process it in kernel.

> >
> > > > until someone manages to change
> > > > net core to be more friendly to userspace devices.
> > > >
> > > > >
> > > > > > Or maybe not well behaved firmware will
> > > > > > set the flag losing error reporting ability.
> > > > >
> > > > > This might be hard since it means not only the set but also the get is
> > > > > unreliable.
> > > > >
> > > > > Thanks
> > > >
> > > > /me shrugs
> > > >
> > > >
> > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > MST
> > > > > > > > >
> > > > > >
> > > >
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virt

Re: [PATCH net-next v4 2/2] virtio-net: add cond_resched() to the command waiting loop

2023-08-11 Thread Jason Wang
On Fri, Aug 11, 2023 at 5:51 PM Michael S. Tsirkin  wrote:
>
> On Fri, Aug 11, 2023 at 05:43:25PM +0800, Jason Wang wrote:
> > On Fri, Aug 11, 2023 at 5:21 PM Michael S. Tsirkin  wrote:
> > >
> > > On Fri, Aug 11, 2023 at 05:18:51PM +0800, Jason Wang wrote:
> > > > On Fri, Aug 11, 2023 at 1:42 PM Michael S. Tsirkin  
> > > > wrote:
> > > > >
> > > > > On Fri, Aug 11, 2023 at 10:23:15AM +0800, Jason Wang wrote:
> > > > > > On Fri, Aug 11, 2023 at 3:41 AM Michael S. Tsirkin 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Tue, Aug 08, 2023 at 10:30:56AM +0800, Jason Wang wrote:
> > > > > > > > On Mon, Jul 31, 2023 at 2:30 PM Jason Wang 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Thu, Jul 27, 2023 at 5:46 PM Michael S. Tsirkin 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > On Thu, Jul 27, 2023 at 04:59:33PM +0800, Jason Wang wrote:
> > > > > > > > > > > > They really shouldn't - any NIC that takes forever to
> > > > > > > > > > > > program will create issues in the networking stack.
> > > > > > > > > > >
> > > > > > > > > > > Unfortunately, it's not rare as the device/cvq could be 
> > > > > > > > > > > implemented
> > > > > > > > > > > via firmware or software.
> > > > > > > > > >
> > > > > > > > > > Currently that mean one either has sane firmware with a 
> > > > > > > > > > scheduler that
> > > > > > > > > > can meet deadlines, or loses ability to report errors back.
> > > > > > > > > >
> > > > > > > > > > > > But if they do they can always set this flag too.
> > > > > > > > > > >
> > > > > > > > > > > This may have false negatives and may confuse the 
> > > > > > > > > > > management.
> > > > > > > > > > >
> > > > > > > > > > > Maybe we can extend the networking core to allow some 
> > > > > > > > > > > device specific
> > > > > > > > > > > configurations to be done with device specific lock 
> > > > > > > > > > > without rtnl. For
> > > > > > > > > > > example, split the set_channels to
> > > > > > > > > > >
> > > > > > > > > > > pre_set_channels
> > > > > > > > > > > set_channels
> > > > > > > > > > > post_set_channels
> > > > > > > > > > >
> > > > > > > > > > > The device specific part could be done in pre and post 
> > > > > > > > > > > without a rtnl lock?
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Would the benefit be that errors can be reported to 
> > > > > > > > > > userspace then?
> > > > > > > > > > Then maybe.  I think you will have to show how this works 
> > > > > > > > > > for at least
> > > > > > > > > > one card besides virtio.
> > > > > > > > >
> > > > > > > > > Even for virtio, this seems not easy, as e.g the
> > > > > > > > > virtnet_send_command() and netif_set_real_num_tx_queues() 
> > > > > > > > > need to
> > > > > > > > > appear to be atomic to the networking core.
> > > > > > > > >
> > > > > > > > > I wonder if we can re-consider the way of a timeout here and 
> > > > > > > > > choose a
> > > > > > > > > sane value as a start.
> > > > > > > >
> > > > > > > > Michael, any more input on this?
> > > > > > > >
> > > > > > > > Thanks
> > > > > > >
> > > > > > > I think this is just mission creep. We are trying to fix
> > > > > > > vduse - let's do that for starters.
> > > > > > >
> > > > > > > Recovering from firmware timeouts is far from trivial and
> > > > > > > just assuming that just because it timed out it will not
> > > > > > > access memory is just as likely to cause memory corruption
> > > > > > > with worse results than an infinite spin.
> > > > > >
> > > > > > Yes, this might require support not only in the driver
> > > > > >
> > > > > > >
> > > > > > > I propose we fix this for vduse and assume hardware/firmware
> > > > > > > is well behaved.
> > > > > >
> > > > > > One major case is the re-connection, in that case it might take
> > > > > > whatever longer that the kernel virito-net driver expects.
> > > > > > So we can have a timeout in VDUSE and trap CVQ then VDUSE can return
> > > > > > and fail early?
> > > > >
> > > > > Ugh more mission creep. not at all my point. vduse should cache
> > > > > values in the driver,
> > > >
> > > > What do you mean by values here? The cvq command?
> > > >
> > > > Thanks
> > >
> > > The card status generally.
> >
> > Just to make sure I understand here. The CVQ needs to be processed by
> > the userspace now. How could we cache the status?
> >
> > Thanks
>
> vduse will have to process it in kernel.

Right, that's my understanding (trap CVQ).

Thanks

>
> > >
> > > > > until someone manages to change
> > > > > net core to be more friendly to userspace devices.
> > > > >
> > > > > >
> > > > > > > Or maybe not well behaved firmware will
> > > > > > > set the flag losing error reporting ability.
> > > > > >
> > > > > > This might be hard since it means not only the set but also the get 
> > > > > > is
> > > > > > unreliable.
> > > > > >
> > > > > > Thanks
> > > > >
> > > > > /me shrugs
> > > > >
> > > > >
> > > > >
> > > > > > >
> > >

Re: [PATCH] virtio_vdpa: build affinity masks conditionally

2023-08-11 Thread Jason Wang
On Fri, Aug 11, 2023 at 5:48 PM Michael S. Tsirkin  wrote:
>
> On Fri, Aug 11, 2023 at 05:41:44PM +0800, Jason Wang wrote:
> > On Fri, Aug 11, 2023 at 5:25 PM Michael S. Tsirkin  wrote:
> > >
> > > On Fri, Aug 11, 2023 at 05:15:39AM -0400, Jason Wang wrote:
> > > > We try to build affinity mask via create_affinity_masks()
> > > > unconditionally which may lead several issues:
> > > >
> > > > - the affinity mask is not used for parent without affinity support
> > > >   (only VDUSE support the affinity now)
> > > > - the logic of create_affinity_masks() might not work for devices
> > > >   other than block. For example it's not rare in the networking device
> > > >   where the number of queues could exceed the number of CPUs. Such
> > > >   case breaks the current affinity logic which is based on
> > > >   group_cpus_evenly() who assumes the number of CPUs are not less than
> > > >   the number of groups. This can trigger a warning[1]:
> > > >
> > > >   if (ret >= 0)
> > > >   WARN_ON(nr_present + nr_others < numgrps);
> > > >
> > > > Fixing this by only build the affinity masks only when
> > > >
> > > > - Driver passes affinity descriptor, driver like virtio-blk can make
> > > >   sure to limit the number of queues when it exceeds the number of CPUs
> > > > - Parent support affinity setting config ops
> > > >
> > > > This help to avoid the warning. More optimizations could be done on
> > > > top.
> > > >
> > > > [1]
> > > > [  682.146655] WARNING: CPU: 6 PID: 1550 at lib/group_cpus.c:400 
> > > > group_cpus_evenly+0x1aa/0x1c0
> > > > [  682.146668] CPU: 6 PID: 1550 Comm: vdpa Not tainted 6.5.0-rc5jason+ 
> > > > #79
> > > > [  682.146671] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
> > > > BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
> > > > [  682.146673] RIP: 0010:group_cpus_evenly+0x1aa/0x1c0
> > > > [  682.146676] Code: 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 cc cc cc cc e8 
> > > > 1b c4 74 ff 48 89 ef e8 13 ac 98 ff 4c 89 e7 45 31 e4 e8 08 ac 98 ff eb 
> > > > c2 <0f> 0b eb b6 e8 fd 05 c3 00 45 31 e4 eb e5 cc cc cc cc cc cc cc cc
> > > > [  682.146679] RSP: 0018:c9000215f498 EFLAGS: 00010293
> > > > [  682.146682] RAX: 0001f1e0 RBX: 0041 RCX: 
> > > > 
> > > > [  682.146684] RDX: 888109922058 RSI: 0041 RDI: 
> > > > 0030
> > > > [  682.146686] RBP: 888109922058 R08: c9000215f498 R09: 
> > > > c9000215f4a0
> > > > [  682.146687] R10: 000198d0 R11: 0030 R12: 
> > > > 888107e02800
> > > > [  682.146689] R13: 0030 R14: 0030 R15: 
> > > > 0041
> > > > [  682.146692] FS:  7fef52315740() GS:88823738() 
> > > > knlGS:
> > > > [  682.146695] CS:  0010 DS:  ES:  CR0: 80050033
> > > > [  682.146696] CR2: 7fef52509000 CR3: 000110dbc004 CR4: 
> > > > 00370ee0
> > > > [  682.146698] DR0:  DR1:  DR2: 
> > > > 
> > > > [  682.146700] DR3:  DR6: fffe0ff0 DR7: 
> > > > 0400
> > > > [  682.146701] Call Trace:
> > > > [  682.146703]  
> > > > [  682.146705]  ? __warn+0x7b/0x130
> > > > [  682.146709]  ? group_cpus_evenly+0x1aa/0x1c0
> > > > [  682.146712]  ? report_bug+0x1c8/0x1e0
> > > > [  682.146717]  ? handle_bug+0x3c/0x70
> > > > [  682.146721]  ? exc_invalid_op+0x14/0x70
> > > > [  682.146723]  ? asm_exc_invalid_op+0x16/0x20
> > > > [  682.146727]  ? group_cpus_evenly+0x1aa/0x1c0
> > > > [  682.146729]  ? group_cpus_evenly+0x15c/0x1c0
> > > > [  682.146731]  create_affinity_masks+0xaf/0x1a0
> > > > [  682.146735]  virtio_vdpa_find_vqs+0x83/0x1d0
> > > > [  682.146738]  ? __pfx_default_calc_sets+0x10/0x10
> > > > [  682.146742]  virtnet_find_vqs+0x1f0/0x370
> > > > [  682.146747]  virtnet_probe+0x501/0xcd0
> > > > [  682.146749]  ? vp_modern_get_status+0x12/0x20
> > > > [  682.146751]  ? get_cap_addr.isra.0+0x10/0xc0
> > > > [  682.146754]  virtio_dev_probe+0x1af/0x260
> > > > [  682.146759]  really_probe+0x1a5/0x410
> > > >
> > > > Fixes: 3dad56823b53 ("virtio-vdpa: Support interrupt affinity spreading 
> > > > mechanism")
> > > > Signed-off-by: Jason Wang 
> > >
> > > This won't fix the case where block has more queues than CPUs though,
> > > will it?
> >
> > Block will limit the number of queues during init_vq():
> >
> > num_vqs = min_t(unsigned int,
> > min_not_zero(num_request_queues, nr_cpu_ids),
> > num_vqs);
> >
> >
> > Thanks
>
> Good point. This doesn't play well with cpu hotplug but that is not new.

Yes.

Thanks

>
> > >
> > > > ---
> > > >  drivers/virtio/virtio_vdpa.c | 17 +++--
> > > >  1 file changed, 11 insertions(+), 6 deletions(-)
> > > >
> > > > diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c
> > > > index 961161da5900..06ce6d8c2e00 100644
> > > > --- a/drivers/virtio/virtio_vdpa

Re: [PATCH net-next v4 2/2] virtio-net: add cond_resched() to the command waiting loop

2023-08-11 Thread Michael S. Tsirkin
On Fri, Aug 11, 2023 at 05:54:15PM +0800, Jason Wang wrote:
> On Fri, Aug 11, 2023 at 5:51 PM Michael S. Tsirkin  wrote:
> >
> > On Fri, Aug 11, 2023 at 05:43:25PM +0800, Jason Wang wrote:
> > > On Fri, Aug 11, 2023 at 5:21 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Fri, Aug 11, 2023 at 05:18:51PM +0800, Jason Wang wrote:
> > > > > On Fri, Aug 11, 2023 at 1:42 PM Michael S. Tsirkin  
> > > > > wrote:
> > > > > >
> > > > > > On Fri, Aug 11, 2023 at 10:23:15AM +0800, Jason Wang wrote:
> > > > > > > On Fri, Aug 11, 2023 at 3:41 AM Michael S. Tsirkin 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Tue, Aug 08, 2023 at 10:30:56AM +0800, Jason Wang wrote:
> > > > > > > > > On Mon, Jul 31, 2023 at 2:30 PM Jason Wang 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > On Thu, Jul 27, 2023 at 5:46 PM Michael S. Tsirkin 
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Jul 27, 2023 at 04:59:33PM +0800, Jason Wang 
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > They really shouldn't - any NIC that takes forever to
> > > > > > > > > > > > > program will create issues in the networking stack.
> > > > > > > > > > > >
> > > > > > > > > > > > Unfortunately, it's not rare as the device/cvq could be 
> > > > > > > > > > > > implemented
> > > > > > > > > > > > via firmware or software.
> > > > > > > > > > >
> > > > > > > > > > > Currently that mean one either has sane firmware with a 
> > > > > > > > > > > scheduler that
> > > > > > > > > > > can meet deadlines, or loses ability to report errors 
> > > > > > > > > > > back.
> > > > > > > > > > >
> > > > > > > > > > > > > But if they do they can always set this flag too.
> > > > > > > > > > > >
> > > > > > > > > > > > This may have false negatives and may confuse the 
> > > > > > > > > > > > management.
> > > > > > > > > > > >
> > > > > > > > > > > > Maybe we can extend the networking core to allow some 
> > > > > > > > > > > > device specific
> > > > > > > > > > > > configurations to be done with device specific lock 
> > > > > > > > > > > > without rtnl. For
> > > > > > > > > > > > example, split the set_channels to
> > > > > > > > > > > >
> > > > > > > > > > > > pre_set_channels
> > > > > > > > > > > > set_channels
> > > > > > > > > > > > post_set_channels
> > > > > > > > > > > >
> > > > > > > > > > > > The device specific part could be done in pre and post 
> > > > > > > > > > > > without a rtnl lock?
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Would the benefit be that errors can be reported to 
> > > > > > > > > > > userspace then?
> > > > > > > > > > > Then maybe.  I think you will have to show how this works 
> > > > > > > > > > > for at least
> > > > > > > > > > > one card besides virtio.
> > > > > > > > > >
> > > > > > > > > > Even for virtio, this seems not easy, as e.g the
> > > > > > > > > > virtnet_send_command() and netif_set_real_num_tx_queues() 
> > > > > > > > > > need to
> > > > > > > > > > appear to be atomic to the networking core.
> > > > > > > > > >
> > > > > > > > > > I wonder if we can re-consider the way of a timeout here 
> > > > > > > > > > and choose a
> > > > > > > > > > sane value as a start.
> > > > > > > > >
> > > > > > > > > Michael, any more input on this?
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > >
> > > > > > > > I think this is just mission creep. We are trying to fix
> > > > > > > > vduse - let's do that for starters.
> > > > > > > >
> > > > > > > > Recovering from firmware timeouts is far from trivial and
> > > > > > > > just assuming that just because it timed out it will not
> > > > > > > > access memory is just as likely to cause memory corruption
> > > > > > > > with worse results than an infinite spin.
> > > > > > >
> > > > > > > Yes, this might require support not only in the driver
> > > > > > >
> > > > > > > >
> > > > > > > > I propose we fix this for vduse and assume hardware/firmware
> > > > > > > > is well behaved.
> > > > > > >
> > > > > > > One major case is the re-connection, in that case it might take
> > > > > > > whatever longer that the kernel virito-net driver expects.
> > > > > > > So we can have a timeout in VDUSE and trap CVQ then VDUSE can 
> > > > > > > return
> > > > > > > and fail early?
> > > > > >
> > > > > > Ugh more mission creep. not at all my point. vduse should cache
> > > > > > values in the driver,
> > > > >
> > > > > What do you mean by values here? The cvq command?
> > > > >
> > > > > Thanks
> > > >
> > > > The card status generally.
> > >
> > > Just to make sure I understand here. The CVQ needs to be processed by
> > > the userspace now. How could we cache the status?
> > >
> > > Thanks
> >
> > vduse will have to process it in kernel.
> 
> Right, that's my understanding (trap CVQ).
> 
> Thanks

oh this what you mean by trap. ok. I don't see a new
for a timeout then though.


> >
> > > >
> > > > > > until someone manages to change
> >

[PATCH -next] drm/virtio: Remove unused function declarations

2023-08-11 Thread Yue Haibing via Virtualization
Commit dc5698e80cf7 ("Add virtio gpu driver.") declared but never
implemented virtio_gpu_attach_status_page()/virtio_gpu_detach_status_page()
Also commit 62fb7a5e1096 ("virtio-gpu: add 3d/virgl support")
declared but never implemented virtio_gpu_fence_ack() and
virtio_gpu_dequeue_fence_func().
Commit c84adb304c10 ("drm/virtio: Support virtgpu exported resources")
declared but never implemented virtgpu_gem_prime_get_uuid().

Signed-off-by: Yue Haibing 
---
 drivers/gpu/drm/virtio/virtgpu_drv.h | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
b/drivers/gpu/drm/virtio/virtgpu_drv.h
index 4126c384286b..8513b671f871 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -344,8 +344,6 @@ void virtio_gpu_object_attach(struct virtio_gpu_device 
*vgdev,
  struct virtio_gpu_object *obj,
  struct virtio_gpu_mem_entry *ents,
  unsigned int nents);
-int virtio_gpu_attach_status_page(struct virtio_gpu_device *vgdev);
-int virtio_gpu_detach_status_page(struct virtio_gpu_device *vgdev);
 void virtio_gpu_cursor_ping(struct virtio_gpu_device *vgdev,
struct virtio_gpu_output *output);
 int virtio_gpu_cmd_get_display_info(struct virtio_gpu_device *vgdev);
@@ -394,11 +392,8 @@ virtio_gpu_cmd_resource_create_3d(struct virtio_gpu_device 
*vgdev,
  struct virtio_gpu_fence *fence);
 void virtio_gpu_ctrl_ack(struct virtqueue *vq);
 void virtio_gpu_cursor_ack(struct virtqueue *vq);
-void virtio_gpu_fence_ack(struct virtqueue *vq);
 void virtio_gpu_dequeue_ctrl_func(struct work_struct *work);
 void virtio_gpu_dequeue_cursor_func(struct work_struct *work);
-void virtio_gpu_dequeue_fence_func(struct work_struct *work);
-
 void virtio_gpu_notify(struct virtio_gpu_device *vgdev);
 
 int
@@ -465,8 +460,6 @@ struct dma_buf *virtgpu_gem_prime_export(struct 
drm_gem_object *obj,
 int flags);
 struct drm_gem_object *virtgpu_gem_prime_import(struct drm_device *dev,
struct dma_buf *buf);
-int virtgpu_gem_prime_get_uuid(struct drm_gem_object *obj,
-  uuid_t *uuid);
 struct drm_gem_object *virtgpu_gem_prime_import_sg_table(
struct drm_device *dev, struct dma_buf_attachment *attach,
struct sg_table *sgt);
-- 
2.34.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next 6/8] virtio-net: support rx netdim

2023-08-11 Thread kernel test robot
Hi Heng,

kernel test robot noticed the following build errors:

[auto build test ERROR on net-next/main]

url:
https://github.com/intel-lab-lkp/linux/commits/Heng-Qi/virtio-net-initially-change-the-value-of-tx-frames/20230811-150529
base:   net-next/main
patch link:
https://lore.kernel.org/r/20230811065512.22190-7-hengqi%40linux.alibaba.com
patch subject: [PATCH net-next 6/8] virtio-net: support rx netdim
config: i386-randconfig-i011-20230811 
(https://download.01.org/0day-ci/archive/20230811/202308112234.awbppmuv-...@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce: 
(https://download.01.org/0day-ci/archive/20230811/202308112234.awbppmuv-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202308112234.awbppmuv-...@intel.com/

All errors (new ones prefixed by >>, old ones prefixed by <<):

WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-kworld-315u.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-kworld-pc150u.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-kworld-plus-tv-analog.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-leadtek-y04g0051.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-lme2510.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-manli.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-mecool-kiii-pro.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-mecool-kii-pro.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-medion-x10.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-minix-neo.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-msi-digivox-iii.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-msi-digivox-ii.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-msi-tvanywhere.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-msi-tvanywhere-plus.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-nebula.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-nec-terratec-cinergy-xs.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-norwood.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-npgtech.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-odroid.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-pctv-sedna.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-pine64.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-pinnacle-color.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-pinnacle-grey.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-pinnacle-pctv-hd.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-pixelview-002t.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-pixelview-mk12.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-pixelview-new.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-pixelview.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-powercolor-real-angel.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-proteus-2309.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-purpletv.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-pv951.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-rc6-mce.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-real-audio-220-32-keys.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-reddo.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-snapstream-firefly.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-streamzap.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-su3000.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-tanix-tx3mini.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-tanix-tx5max.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-tbs-nec.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-technisat-ts35.o
WARNING: modp

Re: [PATCH net-next 7/8] virtio-net: support tx netdim

2023-08-11 Thread kernel test robot
Hi Heng,

kernel test robot noticed the following build errors:

[auto build test ERROR on net-next/main]

url:
https://github.com/intel-lab-lkp/linux/commits/Heng-Qi/virtio-net-initially-change-the-value-of-tx-frames/20230811-150529
base:   net-next/main
patch link:
https://lore.kernel.org/r/20230811065512.22190-8-hengqi%40linux.alibaba.com
patch subject: [PATCH net-next 7/8] virtio-net: support tx netdim
config: i386-randconfig-i011-20230811 
(https://download.01.org/0day-ci/archive/20230811/202308112350.gtajkzog-...@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce: 
(https://download.01.org/0day-ci/archive/20230811/202308112350.gtajkzog-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202308112350.gtajkzog-...@intel.com/

All errors (new ones prefixed by >>, old ones prefixed by <<):

WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-kworld-315u.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-kworld-pc150u.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-kworld-plus-tv-analog.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-leadtek-y04g0051.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-lme2510.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-manli.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-mecool-kiii-pro.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-mecool-kii-pro.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-medion-x10.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-minix-neo.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-msi-digivox-iii.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-msi-digivox-ii.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-msi-tvanywhere.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-msi-tvanywhere-plus.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-nebula.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-nec-terratec-cinergy-xs.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-norwood.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-npgtech.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-odroid.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-pctv-sedna.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-pine64.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-pinnacle-color.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-pinnacle-grey.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-pinnacle-pctv-hd.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-pixelview-002t.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-pixelview-mk12.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-pixelview-new.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-pixelview.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-powercolor-real-angel.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-proteus-2309.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-purpletv.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-pv951.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-rc6-mce.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-real-audio-220-32-keys.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-reddo.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-snapstream-firefly.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-streamzap.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-su3000.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-tanix-tx3mini.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-tanix-tx5max.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-tbs-nec.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
drivers/media/rc/keymaps/rc-technisat-ts35.o
WARNING: modp

Re: [PATCH V10 04/19] riscv: qspinlock: Add basic queued_spinlock support

2023-08-11 Thread Waiman Long



On 8/2/23 12:46, guo...@kernel.org wrote:

\
diff --git a/arch/riscv/include/asm/spinlock.h 
b/arch/riscv/include/asm/spinlock.h
new file mode 100644
index ..c644a92d4548
--- /dev/null
+++ b/arch/riscv/include/asm/spinlock.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __ASM_RISCV_SPINLOCK_H
+#define __ASM_RISCV_SPINLOCK_H
+
+#ifdef CONFIG_QUEUED_SPINLOCKS
+#define _Q_PENDING_LOOPS   (1 << 9)
+#endif
+
+#ifdef CONFIG_QUEUED_SPINLOCKS


You can merge the two "#ifdef CONFIG_QUEUED_SPINLOCKS" into single one 
to avoid the duplication.


Cheers,
Longman


+#include 
+#include 
+#else
+#include 
+#endif
+
+#endif /* __ASM_RISCV_SPINLOCK_H */


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V10 05/19] riscv: qspinlock: Introduce combo spinlock

2023-08-11 Thread Waiman Long

On 8/2/23 12:46, guo...@kernel.org wrote:

From: Guo Ren 

Combo spinlock could support queued and ticket in one Linux Image and
select them during boot time via errata mechanism. Here is the func
size (Bytes) comparison table below:

TYPE: COMBO | TICKET | QUEUED
arch_spin_lock  : 106   | 60 | 50
arch_spin_unlock: 54| 36 | 26
arch_spin_trylock   : 110   | 72 | 54
arch_spin_is_locked : 48| 34 | 20
arch_spin_is_contended  : 56| 40 | 24
rch_spin_value_unlocked : 48| 34 | 24

One example of disassemble combo arch_spin_unlock:
0x8000409c <+14>:nop# detour slot
0x800040a0 <+18>:fence   rw,w   # queued spinlock start
0x800040a4 <+22>:sb  zero,0(a4) # queued spinlock end
0x800040a8 <+26>:ld  s0,8(sp)
0x800040aa <+28>:addisp,sp,16
0x800040ac <+30>:ret
0x800040ae <+32>:lw  a5,0(a4)   # ticket spinlock start
0x800040b0 <+34>:sext.w  a5,a5
0x800040b2 <+36>:fence   rw,w
0x800040b6 <+40>:addiw   a5,a5,1
0x800040b8 <+42>:sllia5,a5,0x30
0x800040ba <+44>:srlia5,a5,0x30
0x800040bc <+46>:sh  a5,0(a4)   # ticket spinlock end
0x800040c0 <+50>:ld  s0,8(sp)
0x800040c2 <+52>:addisp,sp,16
0x800040c4 <+54>:ret

The qspinlock is smaller and faster than ticket-lock when all are in
fast-path, and combo spinlock could provide a compatible Linux Image
for different micro-arch design (weak/strict fwd guarantee) processors.

Signed-off-by: Guo Ren 
Signed-off-by: Guo Ren 
---
  arch/riscv/Kconfig|  9 +++-
  arch/riscv/include/asm/hwcap.h|  1 +
  arch/riscv/include/asm/spinlock.h | 87 ++-
  arch/riscv/kernel/cpufeature.c| 10 
  4 files changed, 104 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index e89a3bea3dc1..119e774a3dcf 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -440,7 +440,7 @@ config NODES_SHIFT
  
  choice

prompt "RISC-V spinlock type"
-   default RISCV_TICKET_SPINLOCKS
+   default RISCV_COMBO_SPINLOCKS
  
  config RISCV_TICKET_SPINLOCKS

bool "Using ticket spinlock"
@@ -452,6 +452,13 @@ config RISCV_QUEUED_SPINLOCKS
help
  Make sure your micro arch LL/SC has a strong forward progress 
guarantee.
  Otherwise, stay at ticket-lock.
+
+config RISCV_COMBO_SPINLOCKS
+   bool "Using combo spinlock"
+   depends on SMP && MMU
+   select ARCH_USE_QUEUED_SPINLOCKS
+   help
+ Select queued spinlock or ticket-lock via errata.
  endchoice
  
  config RISCV_ALTERNATIVE

diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
index f041bfa7f6a0..08ae75a694c2 100644
--- a/arch/riscv/include/asm/hwcap.h
+++ b/arch/riscv/include/asm/hwcap.h
@@ -54,6 +54,7 @@
  #define RISCV_ISA_EXT_ZIFENCEI41
  #define RISCV_ISA_EXT_ZIHPM   42
  
+#define RISCV_ISA_EXT_XTICKETLOCK	63

  #define RISCV_ISA_EXT_MAX 64
  #define RISCV_ISA_EXT_NAME_LEN_MAX32
  
diff --git a/arch/riscv/include/asm/spinlock.h b/arch/riscv/include/asm/spinlock.h

index c644a92d4548..9eb3ad31e564 100644
--- a/arch/riscv/include/asm/spinlock.h
+++ b/arch/riscv/include/asm/spinlock.h
@@ -7,11 +7,94 @@
  #define _Q_PENDING_LOOPS  (1 << 9)
  #endif
  


I see why you separated the _Q_PENDING_LOOPS out.



+#ifdef CONFIG_RISCV_COMBO_SPINLOCKS
+#include 
+
+#undef arch_spin_is_locked
+#undef arch_spin_is_contended
+#undef arch_spin_value_unlocked
+#undef arch_spin_lock
+#undef arch_spin_trylock
+#undef arch_spin_unlock
+
+#include 
+#include 
+
+#undef arch_spin_is_locked
+#undef arch_spin_is_contended
+#undef arch_spin_value_unlocked
+#undef arch_spin_lock
+#undef arch_spin_trylock
+#undef arch_spin_unlock
Perhaps you can add a macro like __no_arch_spinlock_redefine to disable 
the various arch_spin_* definition in qspinlock.h and ticket_spinlock.h.

+
+#define COMBO_DETOUR   \
+   asm_volatile_goto(ALTERNATIVE(  \
+   "nop",\
+   "j %l[ticket_spin_lock]", \
+   0,  \
+   RISCV_ISA_EXT_XTICKETLOCK,  \
+   CONFIG_RISCV_COMBO_SPINLOCKS)   \
+   : : : : ticket_spin_lock);
+
+static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
+{
+   COMBO_DETOUR
+   queued_spin_lock(lock);
+   return;
+ticket_spin_lock:
+   ticket_spin_lock(lock);
+}
+
+static __always_inline bool arch_spin_trylock(arch_spinlock_t *lock)
+{
+   COMBO_DETOUR
+   return queued_spin_trylock(lock);
+ticket_spin_lock:
+   return ticket_spin_trylock(lock);
+}
+
+static __always_inline void

Re: [PATCH V10 18/19] locking/qspinlock: Move pv_ops into x86 directory

2023-08-11 Thread Waiman Long

On 8/2/23 12:47, guo...@kernel.org wrote:

From: Guo Ren 

The pv_ops belongs to x86 custom infrastructure and cleans up the
cna_configure_spin_lock_slowpath() with standard code. This is
preparation for riscv support CNA qspoinlock.


CNA qspinlock has not been merged into mainline yet. I will suggest you 
drop the last 2 patches for now. Of course, you can provide benchmark 
data to boost the case for the inclusion of the CNA qspinlock patch into 
the mainline.


Cheers,
Longman

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v11 8/8] vhost: use vhost_tasks for worker threads

2023-08-11 Thread Mike Christie
On 8/10/23 1:57 PM, Michael S. Tsirkin wrote:
> On Sat, Jul 22, 2023 at 11:03:29PM -0500, michael.chris...@oracle.com wrote:
>> On 7/20/23 8:06 AM, Michael S. Tsirkin wrote:
>>> On Thu, Feb 02, 2023 at 05:25:17PM -0600, Mike Christie wrote:
 For vhost workers we use the kthread API which inherit's its values from
 and checks against the kthreadd thread. This results in the wrong RLIMITs
 being checked, so while tools like libvirt try to control the number of
 threads based on the nproc rlimit setting we can end up creating more
 threads than the user wanted.

 This patch has us use the vhost_task helpers which will inherit its
 values/checks from the thread that owns the device similar to if we did
 a clone in userspace. The vhost threads will now be counted in the nproc
 rlimits. And we get features like cgroups and mm sharing automatically,
 so we can remove those calls.

 Signed-off-by: Mike Christie 
 Acked-by: Michael S. Tsirkin 
>>>
>>>
>>> Hi Mike,
>>> So this seems to have caused a measureable regression in networking
>>> performance (about 30%). Take a look here, and there's a zip file
>>> with detailed measuraments attached:
>>>
>>> https://bugzilla.redhat.com/show_bug.cgi?id=603
>>>
>>>
>>> Could you take a look please?
>>> You can also ask reporter questions there assuming you
>>> have or can create a (free) account.
>>>
>>
>> Sorry for the late reply. I just got home from vacation.
>>
>> The account creation link seems to be down. I keep getting a
>> "unable to establish SMTP connection to bz-exim-prod port 25 " error.
>>
>> Can you give me Quan's email?
>>
>> I think I can replicate the problem. I just need some extra info from Quan:
>>
>> 1. Just double check that they are using RHEL 9 on the host running the VMs.
>> 2. The kernel config
>> 3. Any tuning that was done. Is tuned running in guest and/or host running 
>> the
>> VMs and what profile is being used in each.
>> 4. Number of vCPUs and virtqueues being used.
>> 5. Can they dump the contents of:
>>
>> /sys/kernel/debug/sched
>>
>> and
>>
>> sysctl  -a
>>
>> on the host running the VMs.
>>
>> 6. With the 6.4 kernel, can they also run a quick test and tell me if they 
>> set
>> the scheduler to batch:
>>
>> ps -T -o comm,pid,tid $QEMU_THREAD
>>
>> then for each vhost thread do:
>>
>> chrt -b -p 0 $VHOST_THREAD
>>
>> Does that end up increasing perf? When I do this I see throughput go up by
>> around 50% vs 6.3 when sessions was 16 or more (16 was the number of vCPUs
>> and virtqueues per net device in the VM). Note that I'm not saying that is a 
>> fix.
>> It's just a difference I noticed when running some other tests.
> 
> 
> Mike I'm unsure what to do at this point. Regressions are not nice
> but if the kernel is released with the new userspace api we won't
> be able to revert. So what's the plan?
> 

I'm sort of stumped. I still can't replicate the problem out of the box. 6.3 and
6.4 perform the same for me. I've tried your setup and settings and with 
different
combos of using things like tuned and irqbalance.

I can sort of force the issue. In 6.4, the vhost thread inherits it's settings
from the parent thread. In 6.3, the vhost thread inherits from kthreadd and we
would then reset the sched settings. So in 6.4 if I just tune the parent 
differently
I can cause different performance. If we want the 6.3 behavior we can do the 
patch
below.

However, I don't think you guys are hitting this because you are just running
qemu from the normal shell and were not doing anything fancy with the sched
settings.


diff --git a/kernel/vhost_task.c b/kernel/vhost_task.c
index da35e5b7f047..f2c2638d1106 100644
--- a/kernel/vhost_task.c
+++ b/kernel/vhost_task.c
@@ -2,6 +2,7 @@
 /*
  * Copyright (C) 2021 Oracle Corporation
  */
+#include 
 #include 
 #include 
 #include 
@@ -22,9 +23,16 @@ struct vhost_task {
 
 static int vhost_task_fn(void *data)
 {
+   static const struct sched_param param = { .sched_priority = 0 };
struct vhost_task *vtsk = data;
bool dead = false;
 
+   /*
+* Don't inherit the parent's sched info, so we maintain compat from
+* when we used kthreads and it reset this info.
+*/
+   sched_setscheduler_nocheck(current, SCHED_NORMAL, ¶m);
+
for (;;) {
bool did_work;
 






___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V10 18/19] locking/qspinlock: Move pv_ops into x86 directory

2023-08-11 Thread Waiman Long


On 8/11/23 20:24, Guo Ren wrote:

On Sat, Aug 12, 2023 at 4:42 AM Waiman Long  wrote:

On 8/2/23 12:47, guo...@kernel.org wrote:

From: Guo Ren 

The pv_ops belongs to x86 custom infrastructure and cleans up the
cna_configure_spin_lock_slowpath() with standard code. This is
preparation for riscv support CNA qspoinlock.

CNA qspinlock has not been merged into mainline yet. I will suggest you
drop the last 2 patches for now. Of course, you can provide benchmark
data to boost the case for the inclusion of the CNA qspinlock patch into
the mainline.

Yes, my lazy, I would separate paravirt and CNA from this series.


paravirt is OK, it is just that CNA hasn't been merged yet.

Cheers,
Longman




Cheers,
Longman





___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization