Re: [PATCH] vhost: fix end of range for access_ok

2017-08-22 Thread Koichiro Den
On Mon, 2017-08-21 at 22:45 +0300, Michael S. Tsirkin wrote:
> During access_ok checks, addr increases as we iterate over the data
> structure, thus addr + len - 1 will point beyond the end of region we
> are translating.  Harmless since we then verify that the region covers
> addr, but let's not waste cpu cycles.
> 
> Reported-by: Koichiro Den 
> Signed-off-by: Michael S. Tsirkin 
> ---
> 
> Lightly tested, would appreciate an ack from reporter.
> 
>  drivers/vhost/vhost.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index e4613a3..ecd70e4 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -1176,7 +1176,7 @@ static int iotlb_access_ok(struct vhost_virtqueue *vq,
>  {
>   const struct vhost_umem_node *node;
>   struct vhost_umem *umem = vq->iotlb;
> - u64 s = 0, size, orig_addr = addr;
> + u64 s = 0, size, orig_addr = addr, last = addr + len - 1;
>  
>   if (vhost_vq_meta_fetch(vq, addr, len, type))
>   return true;
> @@ -1184,7 +1184,7 @@ static int iotlb_access_ok(struct vhost_virtqueue *vq,
>   while (len > s) {
>   node = vhost_umem_interval_tree_iter_first(&umem->umem_tree,
>      addr,
> -    addr + len - 1);
> +    last);
>   if (node == NULL || node->start > addr) {
>       vhost_iotlb_miss(vq, addr, access);
>   return false;

Michael, Thank you for this one.

Acked-by: Koichiro Den 



Re: [PATCH net] Revert "vhost: cache used event for better performance"

2017-08-13 Thread Koichiro Den
Thanks for your comments, Michael and Jason. And I'm sorry about late response.
To be honest, I am on a summer vacation until next Tuesday.

I noticed that what I wrote was not sufficient. Regardless of caching mechanism
existence, the "event" could legitimately be at any point out of the latest
interval, which vhost_notify checks it against, meaning that if it's out of the
interval we cannot distinguish whether or not it lags behind or has a lead. And
it seems to conform to the spec. Thanks for leading me to the spec. The corner
case I point out here is:
(0) event idx feature turned on + TX napi turned off
-> (1) guest-side TX traffic bursting occurs and delayed callback set
-> (2) some interruption triggers skb_xmit_done
-> (3) guest-side modest traffic makes the interval proceed to unbounded extent
without updating "event" since NO_INTERRUPT continues to be set on its shadow
flag.

IMHO, if you plan to make TX napi the only choice, doing this sort of
optimisation beforehand seems likely to be in vain.

So, if the none-TX napi case continues to coexist, what I would like to suggest
is not just the sort of my last email, but like making maximum staleness of
"event" less than or equal to vq.num, and afterward caching suggestion.
Otherwise, I guess I should not repost my last email since it would make matters
 too complicated even though it will soon be removed when TX-napi becomes the
only choice.

Thanks!


On Wed, 2017-08-09 at 07:37 +0300, Michael S. Tsirkin wrote:
> On Wed, Aug 09, 2017 at 10:38:10AM +0800, Jason Wang wrote:
> > I think don't think current code can work well if vq.num is grater than
> > 2^15. Since all cached idx is u16. This looks like a bug which needs to be
> > fixed.
> 
> That's a limitation of virtio 1.0.
> 
> > > * else if the interval of vq.num is [2^15, 2^16):
> > > the logic in the original patch (809ecb9bca6a9) suffices
> > > * else (= less than 2^15) (optional):
> > > checking only (vring_need_event(vq->last_used_event, new + vq->num, new)
> > > would suffice.
> > > 
> > > Am I missing something, or is this irrelevant?
> 
> Could you pls repost the suggestion copying virtio-dev mailing list
> (subscriber only, sorry about that, but host/guest ABI discussions
> need to copy that list)?
> 
> > Looks not, I think this may work. Let me do some test.
> > 
> > Thanks
> 
> I think that at this point it's prudent to add a feature bit
> as the virtio spec does not require to never move the event index back.
> 



Re: [PATCH net] Revert "vhost: cache used event for better performance"

2017-08-13 Thread Koichiro Den
Sorry I mistakenly focused on NET case, please pass it over. I will do any
relevant suggestion in patch-based way. Thanks.

On Sun, 2017-08-13 at 23:11 +0900, Koichiro Den wrote:
> Thanks for your comments, Michael and Jason. And I'm sorry about late
> response.
> To be honest, I am on a summer vacation until next Tuesday.
> 
> I noticed that what I wrote was not sufficient. Regardless of caching
> mechanism
> existence, the "event" could legitimately be at any point out of the latest
> interval, which vhost_notify checks it against, meaning that if it's out of
> the
> interval we cannot distinguish whether or not it lags behind or has a
> lead. And
> it seems to conform to the spec. Thanks for leading me to the spec. The corner
> case I point out here is:
> (0) event idx feature turned on + TX napi turned off
> -> (1) guest-side TX traffic bursting occurs and delayed callback set
> -> (2) some interruption triggers skb_xmit_done
> -> (3) guest-side modest traffic makes the interval proceed to unbounded
> extent
> without updating "event" since NO_INTERRUPT continues to be set on its shadow
> flag.
> 
> IMHO, if you plan to make TX napi the only choice, doing this sort of
> optimisation beforehand seems likely to be in vain.
> 
> So, if the none-TX napi case continues to coexist, what I would like to
> suggest
> is not just the sort of my last email, but like making maximum staleness of
> "event" less than or equal to vq.num, and afterward caching suggestion.
> Otherwise, I guess I should not repost my last email since it would make
> matters
>  too complicated even though it will soon be removed when TX-napi becomes the
> only choice.
> 
> Thanks!
> 
> 
> On Wed, 2017-08-09 at 07:37 +0300, Michael S. Tsirkin wrote:
> > On Wed, Aug 09, 2017 at 10:38:10AM +0800, Jason Wang wrote:
> > > I think don't think current code can work well if vq.num is grater than
> > > 2^15. Since all cached idx is u16. This looks like a bug which needs to be
> > > fixed.
> > 
> > That's a limitation of virtio 1.0.
> > 
> > > > * else if the interval of vq.num is [2^15, 2^16):
> > > > the logic in the original patch (809ecb9bca6a9) suffices
> > > > * else (= less than 2^15) (optional):
> > > > checking only (vring_need_event(vq->last_used_event, new + vq->num, new)
> > > > would suffice.
> > > > 
> > > > Am I missing something, or is this irrelevant?
> > 
> > Could you pls repost the suggestion copying virtio-dev mailing list
> > (subscriber only, sorry about that, but host/guest ABI discussions
> > need to copy that list)?
> > 
> > > Looks not, I think this may work. Let me do some test.
> > > 
> > > Thanks
> > 
> > I think that at this point it's prudent to add a feature bit
> > as the virtio spec does not require to never move the event index back.
> > 



Re: [PATCH] virtio_net: drain unconsumed tx completions if any before dql_reset

2024-11-27 Thread Koichiro Den
On Thu, Nov 28, 2024 at 10:57:01AM +0800, Jason Wang wrote:
> On Wed, Nov 27, 2024 at 12:08 PM Koichiro Den
>  wrote:
> >
> > On Wed, Nov 27, 2024 at 11:24:15AM +0800, Jason Wang wrote:
> > > On Tue, Nov 26, 2024 at 12:44 PM Koichiro Den
> > >  wrote:
> > > >
> > > > On Tue, Nov 26, 2024 at 11:50:17AM +0800, Jason Wang wrote:
> > > > > On Tue, Nov 26, 2024 at 10:42 AM Koichiro Den
> > > > >  wrote:
> > > > > >
> > > > > > When virtnet_close is followed by virtnet_open, there is a slight 
> > > > > > chance
> > > > > > that some TX completions remain unconsumed. Those are handled 
> > > > > > during the
> > > > > > first NAPI poll, but since dql_reset occurs just beforehand, it can 
> > > > > > lead
> > > > > > to a crash [1].
> > > > > >
> > > > > > This issue can be reproduced by running: `while :; do ip l set DEV 
> > > > > > down;
> > > > > > ip l set DEV up; done` under heavy network TX load from inside of 
> > > > > > the
> > > > > > machine.
> > > > > >
> > > > > > To fix this, drain unconsumed TX completions if any before 
> > > > > > dql_reset,
> > > > > > allowing BQL to start cleanly.
> > > > > >
> > > > > > [ cut here ]
> > > > > > kernel BUG at lib/dynamic_queue_limits.c:99!
> > > > > > Oops: invalid opcode:  [#1] PREEMPT SMP NOPTI
> > > > > > CPU: 7 UID: 0 PID: 1598 Comm: ip Tainted: GN 
> > > > > > 6.12.0net-next_main+ #2
> > > > > > Tainted: [N]=TEST
> > > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), \
> > > > > > BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> > > > > > RIP: 0010:dql_completed+0x26b/0x290
> > > > > > Code: b7 c2 49 89 e9 44 89 da 89 c6 4c 89 d7 e8 ed 17 47 00 58 65 
> > > > > > ff 0d
> > > > > > 4d 27 90 7e 0f 85 fd fe ff ff e8 ea 53 8d ff e9 f3 fe ff ff <0f> 0b 
> > > > > > 01
> > > > > > d2 44 89 d1 29 d1 ba 00 00 00 00 0f 48 ca e9 28 ff ff ff
> > > > > > RSP: 0018:c92b0d08 EFLAGS: 00010297
> > > > > > RAX:  RBX: 888102398c80 RCX: 80190009
> > > > > > RDX:  RSI: 006a RDI: 
> > > > > > RBP: 888102398c00 R08:  R09: 
> > > > > > R10: 00ca R11: 00015681 R12: 0001
> > > > > > R13: c92b0d68 R14: 8885e000 R15: 8881107aca40
> > > > > > FS:  7f41ded69500() GS:888667dc()
> > > > > > knlGS:
> > > > > > CS:  0010 DS:  ES:  CR0: 80050033
> > > > > > CR2: 556ccc2dc1a0 CR3: 000104fd8003 CR4: 00772ef0
> > > > > > PKRU: 5554
> > > > > > Call Trace:
> > > > > >  
> > > > > >  ? die+0x32/0x80
> > > > > >  ? do_trap+0xd9/0x100
> > > > > >  ? dql_completed+0x26b/0x290
> > > > > >  ? dql_completed+0x26b/0x290
> > > > > >  ? do_error_trap+0x6d/0xb0
> > > > > >  ? dql_completed+0x26b/0x290
> > > > > >  ? exc_invalid_op+0x4c/0x60
> > > > > >  ? dql_completed+0x26b/0x290
> > > > > >  ? asm_exc_invalid_op+0x16/0x20
> > > > > >  ? dql_completed+0x26b/0x290
> > > > > >  __free_old_xmit+0xff/0x170 [virtio_net]
> > > > > >  free_old_xmit+0x54/0xc0 [virtio_net]
> > > > > >  virtnet_poll+0xf4/0xe30 [virtio_net]
> > > > > >  ? __update_load_avg_cfs_rq+0x264/0x2d0
> > > > > >  ? update_curr+0x35/0x260
> > > > > >  ? reweight_entity+0x1be/0x260
> > > > > >  __napi_poll.constprop.0+0x28/0x1c0
> > > > > >  net_rx_action+0x329/0x420
> > > > > >  ? enqueue_hrtimer+0x35/0x90
> > > > > >  ? trace_hardirqs_on+0x1d/0x80
> > > > > >  ? kvm_sched_clock_read+0xd/0x20
> > > > > >  ? sched_clock+0xc/0x30
> > > > > >  ? kvm_sched_clock_read+0xd/0x20
> > > > > >  ? sched_clock+0xc/0x30
> &g

Re: [PATCH] virtio_net: drain unconsumed tx completions if any before dql_reset

2024-11-26 Thread Koichiro Den
On Wed, Nov 27, 2024 at 11:24:15AM +0800, Jason Wang wrote:
> On Tue, Nov 26, 2024 at 12:44 PM Koichiro Den
>  wrote:
> >
> > On Tue, Nov 26, 2024 at 11:50:17AM +0800, Jason Wang wrote:
> > > On Tue, Nov 26, 2024 at 10:42 AM Koichiro Den
> > >  wrote:
> > > >
> > > > When virtnet_close is followed by virtnet_open, there is a slight chance
> > > > that some TX completions remain unconsumed. Those are handled during the
> > > > first NAPI poll, but since dql_reset occurs just beforehand, it can lead
> > > > to a crash [1].
> > > >
> > > > This issue can be reproduced by running: `while :; do ip l set DEV down;
> > > > ip l set DEV up; done` under heavy network TX load from inside of the
> > > > machine.
> > > >
> > > > To fix this, drain unconsumed TX completions if any before dql_reset,
> > > > allowing BQL to start cleanly.
> > > >
> > > > [ cut here ]
> > > > kernel BUG at lib/dynamic_queue_limits.c:99!
> > > > Oops: invalid opcode:  [#1] PREEMPT SMP NOPTI
> > > > CPU: 7 UID: 0 PID: 1598 Comm: ip Tainted: GN 6.12.0net-next_main+ #2
> > > > Tainted: [N]=TEST
> > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), \
> > > > BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> > > > RIP: 0010:dql_completed+0x26b/0x290
> > > > Code: b7 c2 49 89 e9 44 89 da 89 c6 4c 89 d7 e8 ed 17 47 00 58 65 ff 0d
> > > > 4d 27 90 7e 0f 85 fd fe ff ff e8 ea 53 8d ff e9 f3 fe ff ff <0f> 0b 01
> > > > d2 44 89 d1 29 d1 ba 00 00 00 00 0f 48 ca e9 28 ff ff ff
> > > > RSP: 0018:c92b0d08 EFLAGS: 00010297
> > > > RAX:  RBX: 888102398c80 RCX: 80190009
> > > > RDX:  RSI: 006a RDI: 
> > > > RBP: 888102398c00 R08:  R09: 
> > > > R10: 00ca R11: 00015681 R12: 0001
> > > > R13: c92b0d68 R14: 8885e000 R15: 8881107aca40
> > > > FS:  7f41ded69500() GS:888667dc()
> > > > knlGS:
> > > > CS:  0010 DS:  ES:  CR0: 80050033
> > > > CR2: 556ccc2dc1a0 CR3: 000104fd8003 CR4: 00772ef0
> > > > PKRU: 5554
> > > > Call Trace:
> > > >  
> > > >  ? die+0x32/0x80
> > > >  ? do_trap+0xd9/0x100
> > > >  ? dql_completed+0x26b/0x290
> > > >  ? dql_completed+0x26b/0x290
> > > >  ? do_error_trap+0x6d/0xb0
> > > >  ? dql_completed+0x26b/0x290
> > > >  ? exc_invalid_op+0x4c/0x60
> > > >  ? dql_completed+0x26b/0x290
> > > >  ? asm_exc_invalid_op+0x16/0x20
> > > >  ? dql_completed+0x26b/0x290
> > > >  __free_old_xmit+0xff/0x170 [virtio_net]
> > > >  free_old_xmit+0x54/0xc0 [virtio_net]
> > > >  virtnet_poll+0xf4/0xe30 [virtio_net]
> > > >  ? __update_load_avg_cfs_rq+0x264/0x2d0
> > > >  ? update_curr+0x35/0x260
> > > >  ? reweight_entity+0x1be/0x260
> > > >  __napi_poll.constprop.0+0x28/0x1c0
> > > >  net_rx_action+0x329/0x420
> > > >  ? enqueue_hrtimer+0x35/0x90
> > > >  ? trace_hardirqs_on+0x1d/0x80
> > > >  ? kvm_sched_clock_read+0xd/0x20
> > > >  ? sched_clock+0xc/0x30
> > > >  ? kvm_sched_clock_read+0xd/0x20
> > > >  ? sched_clock+0xc/0x30
> > > >  ? sched_clock_cpu+0xd/0x1a0
> > > >  handle_softirqs+0x138/0x3e0
> > > >  do_softirq.part.0+0x89/0xc0
> > > >  
> > > >  
> > > >  __local_bh_enable_ip+0xa7/0xb0
> > > >  virtnet_open+0xc8/0x310 [virtio_net]
> > > >  __dev_open+0xfa/0x1b0
> > > >  __dev_change_flags+0x1de/0x250
> > > >  dev_change_flags+0x22/0x60
> > > >  do_setlink.isra.0+0x2df/0x10b0
> > > >  ? rtnetlink_rcv_msg+0x34f/0x3f0
> > > >  ? netlink_rcv_skb+0x54/0x100
> > > >  ? netlink_unicast+0x23e/0x390
> > > >  ? netlink_sendmsg+0x21e/0x490
> > > >  ? sys_sendmsg+0x31b/0x350
> > > >  ? avc_has_perm_noaudit+0x67/0xf0
> > > >  ? cred_has_capability.isra.0+0x75/0x110
> > > >  ? __nla_validate_parse+0x5f/0xee0
> > > >  ? __pfx___probestub_irq_enable+0x3/0x10
> > > >  ? __create_object+0x5e/0x90
> > > >  ? security_capable+0x3b/0x70
> > > >  

Re: [PATCH] virtio_net: drain unconsumed tx completions if any before dql_reset

2024-11-30 Thread Koichiro Den
On Fri, Nov 29, 2024 at 10:18:04AM +0800, Jason Wang wrote:
> On Thu, Nov 28, 2024 at 12:25 PM Koichiro Den
>  wrote:
> >
> > On Thu, Nov 28, 2024 at 10:57:01AM +0800, Jason Wang wrote:
> > > On Wed, Nov 27, 2024 at 12:08 PM Koichiro Den
> > >  wrote:
> > > >
> > > > On Wed, Nov 27, 2024 at 11:24:15AM +0800, Jason Wang wrote:
> > > > > On Tue, Nov 26, 2024 at 12:44 PM Koichiro Den
> > > > >  wrote:
> > > > > >
> > > > > > On Tue, Nov 26, 2024 at 11:50:17AM +0800, Jason Wang wrote:
> > > > > > > On Tue, Nov 26, 2024 at 10:42 AM Koichiro Den
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > When virtnet_close is followed by virtnet_open, there is a 
> > > > > > > > slight chance
> > > > > > > > that some TX completions remain unconsumed. Those are handled 
> > > > > > > > during the
> > > > > > > > first NAPI poll, but since dql_reset occurs just beforehand, it 
> > > > > > > > can lead
> > > > > > > > to a crash [1].
> > > > > > > >
> > > > > > > > This issue can be reproduced by running: `while :; do ip l set 
> > > > > > > > DEV down;
> > > > > > > > ip l set DEV up; done` under heavy network TX load from inside 
> > > > > > > > of the
> > > > > > > > machine.
> > > > > > > >
> > > > > > > > To fix this, drain unconsumed TX completions if any before 
> > > > > > > > dql_reset,
> > > > > > > > allowing BQL to start cleanly.
> > > > > > > >
> > > > > > > > [ cut here ]
> > > > > > > > kernel BUG at lib/dynamic_queue_limits.c:99!
> > > > > > > > Oops: invalid opcode:  [#1] PREEMPT SMP NOPTI
> > > > > > > > CPU: 7 UID: 0 PID: 1598 Comm: ip Tainted: GN 
> > > > > > > > 6.12.0net-next_main+ #2
> > > > > > > > Tainted: [N]=TEST
> > > > > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), \
> > > > > > > > BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> > > > > > > > RIP: 0010:dql_completed+0x26b/0x290
> > > > > > > > Code: b7 c2 49 89 e9 44 89 da 89 c6 4c 89 d7 e8 ed 17 47 00 58 
> > > > > > > > 65 ff 0d
> > > > > > > > 4d 27 90 7e 0f 85 fd fe ff ff e8 ea 53 8d ff e9 f3 fe ff ff 
> > > > > > > > <0f> 0b 01
> > > > > > > > d2 44 89 d1 29 d1 ba 00 00 00 00 0f 48 ca e9 28 ff ff ff
> > > > > > > > RSP: 0018:c92b0d08 EFLAGS: 00010297
> > > > > > > > RAX:  RBX: 888102398c80 RCX: 
> > > > > > > > 80190009
> > > > > > > > RDX:  RSI: 006a RDI: 
> > > > > > > > 
> > > > > > > > RBP: 888102398c00 R08:  R09: 
> > > > > > > > 
> > > > > > > > R10: 00ca R11: 00015681 R12: 
> > > > > > > > 0001
> > > > > > > > R13: c92b0d68 R14: 8885e000 R15: 
> > > > > > > > 8881107aca40
> > > > > > > > FS:  7f41ded69500() GS:888667dc()
> > > > > > > > knlGS:
> > > > > > > > CS:  0010 DS:  ES:  CR0: 80050033
> > > > > > > > CR2: 556ccc2dc1a0 CR3: 000104fd8003 CR4: 
> > > > > > > > 00772ef0
> > > > > > > > PKRU: 5554
> > > > > > > > Call Trace:
> > > > > > > >  
> > > > > > > >  ? die+0x32/0x80
> > > > > > > >  ? do_trap+0xd9/0x100
> > > > > > > >  ? dql_completed+0x26b/0x290
> > > > > > > >  ? dql_completed+0x26b/0x290
> > > > > > > >  ? do_error_trap+0x6d/0xb0
> > > > > > > >  ? dql_completed+0x26b/0x290
> > > > > > > >  ? exc_invalid_op+0x4c/0x60
> > > &g

[PATCH] virtio_net: drain unconsumed tx completions if any before dql_reset

2024-11-25 Thread Koichiro Den
When virtnet_close is followed by virtnet_open, there is a slight chance
that some TX completions remain unconsumed. Those are handled during the
first NAPI poll, but since dql_reset occurs just beforehand, it can lead
to a crash [1].

This issue can be reproduced by running: `while :; do ip l set DEV down;
ip l set DEV up; done` under heavy network TX load from inside of the
machine.

To fix this, drain unconsumed TX completions if any before dql_reset,
allowing BQL to start cleanly.

[ cut here ]
kernel BUG at lib/dynamic_queue_limits.c:99!
Oops: invalid opcode:  [#1] PREEMPT SMP NOPTI
CPU: 7 UID: 0 PID: 1598 Comm: ip Tainted: GN 6.12.0net-next_main+ #2
Tainted: [N]=TEST
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), \
BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
RIP: 0010:dql_completed+0x26b/0x290
Code: b7 c2 49 89 e9 44 89 da 89 c6 4c 89 d7 e8 ed 17 47 00 58 65 ff 0d
4d 27 90 7e 0f 85 fd fe ff ff e8 ea 53 8d ff e9 f3 fe ff ff <0f> 0b 01
d2 44 89 d1 29 d1 ba 00 00 00 00 0f 48 ca e9 28 ff ff ff
RSP: 0018:c92b0d08 EFLAGS: 00010297
RAX:  RBX: 888102398c80 RCX: 80190009
RDX:  RSI: 006a RDI: 
RBP: 888102398c00 R08:  R09: 
R10: 00ca R11: 00015681 R12: 0001
R13: c92b0d68 R14: 8885e000 R15: 8881107aca40
FS:  7f41ded69500() GS:888667dc()
knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 556ccc2dc1a0 CR3: 000104fd8003 CR4: 00772ef0
PKRU: 5554
Call Trace:
 
 ? die+0x32/0x80
 ? do_trap+0xd9/0x100
 ? dql_completed+0x26b/0x290
 ? dql_completed+0x26b/0x290
 ? do_error_trap+0x6d/0xb0
 ? dql_completed+0x26b/0x290
 ? exc_invalid_op+0x4c/0x60
 ? dql_completed+0x26b/0x290
 ? asm_exc_invalid_op+0x16/0x20
 ? dql_completed+0x26b/0x290
 __free_old_xmit+0xff/0x170 [virtio_net]
 free_old_xmit+0x54/0xc0 [virtio_net]
 virtnet_poll+0xf4/0xe30 [virtio_net]
 ? __update_load_avg_cfs_rq+0x264/0x2d0
 ? update_curr+0x35/0x260
 ? reweight_entity+0x1be/0x260
 __napi_poll.constprop.0+0x28/0x1c0
 net_rx_action+0x329/0x420
 ? enqueue_hrtimer+0x35/0x90
 ? trace_hardirqs_on+0x1d/0x80
 ? kvm_sched_clock_read+0xd/0x20
 ? sched_clock+0xc/0x30
 ? kvm_sched_clock_read+0xd/0x20
 ? sched_clock+0xc/0x30
 ? sched_clock_cpu+0xd/0x1a0
 handle_softirqs+0x138/0x3e0
 do_softirq.part.0+0x89/0xc0
 
 
 __local_bh_enable_ip+0xa7/0xb0
 virtnet_open+0xc8/0x310 [virtio_net]
 __dev_open+0xfa/0x1b0
 __dev_change_flags+0x1de/0x250
 dev_change_flags+0x22/0x60
 do_setlink.isra.0+0x2df/0x10b0
 ? rtnetlink_rcv_msg+0x34f/0x3f0
 ? netlink_rcv_skb+0x54/0x100
 ? netlink_unicast+0x23e/0x390
 ? netlink_sendmsg+0x21e/0x490
 ? sys_sendmsg+0x31b/0x350
 ? avc_has_perm_noaudit+0x67/0xf0
 ? cred_has_capability.isra.0+0x75/0x110
 ? __nla_validate_parse+0x5f/0xee0
 ? __pfx___probestub_irq_enable+0x3/0x10
 ? __create_object+0x5e/0x90
 ? security_capable+0x3b/0x70
 rtnl_newlink+0x784/0xaf0
 ? avc_has_perm_noaudit+0x67/0xf0
 ? cred_has_capability.isra.0+0x75/0x110
 ? stack_depot_save_flags+0x24/0x6d0
 ? __pfx_rtnl_newlink+0x10/0x10
 rtnetlink_rcv_msg+0x34f/0x3f0
 ? do_syscall_64+0x6c/0x180
 ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
 ? __pfx_rtnetlink_rcv_msg+0x10/0x10
 netlink_rcv_skb+0x54/0x100
 netlink_unicast+0x23e/0x390
 netlink_sendmsg+0x21e/0x490
 sys_sendmsg+0x31b/0x350
 ? copy_msghdr_from_user+0x6d/0xa0
 ___sys_sendmsg+0x86/0xd0
 ? __pte_offset_map+0x17/0x160
 ? preempt_count_add+0x69/0xa0
 ? __call_rcu_common.constprop.0+0x147/0x610
 ? preempt_count_add+0x69/0xa0
 ? preempt_count_add+0x69/0xa0
 ? _raw_spin_trylock+0x13/0x60
 ? trace_hardirqs_on+0x1d/0x80
 __sys_sendmsg+0x66/0xc0
 do_syscall_64+0x6c/0x180
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f41defe5b34
Code: 15 e1 12 0f 00 f7 d8 64 89 02 b8 ff ff ff ff eb bf 0f 1f 44 00 00
f3 0f 1e fa 80 3d 35 95 0f 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00
f0 ff ff 77 4c c3 0f 1f 00 55 48 89 e5 48 83 ec 20 89 55
RSP: 002b:7ffe5336ecc8 EFLAGS: 0202 ORIG_RAX: 002e
RAX: ffda RBX: 0003 RCX: 7f41defe5b34
RDX:  RSI: 7ffe5336ed30 RDI: 0003
RBP: 7ffe5336eda0 R08: 0010 R09: 0001
R10: 7ffe5336f6f9 R11: 0202 R12: 0003
R13: 67452259 R14: 556ccc28b040 R15: 
 
[...]
---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

Fixes: c8bd1f7f3e61 ("virtio_net: add support for Byte Queue Limits")
Cc:  # v6.11+
Signed-off-by: Koichiro Den 
---
 drivers/net/virtio_net.c | 37 +
 1 file changed, 29 insertions(+), 8 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 64c87bb48a41..3e36c0470600 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -513,7 +513,7 @@

Re: [PATCH] virtio_net: drain unconsumed tx completions if any before dql_reset

2024-11-25 Thread Koichiro Den
On Tue, Nov 26, 2024 at 11:50:17AM +0800, Jason Wang wrote:
> On Tue, Nov 26, 2024 at 10:42 AM Koichiro Den
>  wrote:
> >
> > When virtnet_close is followed by virtnet_open, there is a slight chance
> > that some TX completions remain unconsumed. Those are handled during the
> > first NAPI poll, but since dql_reset occurs just beforehand, it can lead
> > to a crash [1].
> >
> > This issue can be reproduced by running: `while :; do ip l set DEV down;
> > ip l set DEV up; done` under heavy network TX load from inside of the
> > machine.
> >
> > To fix this, drain unconsumed TX completions if any before dql_reset,
> > allowing BQL to start cleanly.
> >
> > [ cut here ]
> > kernel BUG at lib/dynamic_queue_limits.c:99!
> > Oops: invalid opcode:  [#1] PREEMPT SMP NOPTI
> > CPU: 7 UID: 0 PID: 1598 Comm: ip Tainted: GN 6.12.0net-next_main+ #2
> > Tainted: [N]=TEST
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), \
> > BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> > RIP: 0010:dql_completed+0x26b/0x290
> > Code: b7 c2 49 89 e9 44 89 da 89 c6 4c 89 d7 e8 ed 17 47 00 58 65 ff 0d
> > 4d 27 90 7e 0f 85 fd fe ff ff e8 ea 53 8d ff e9 f3 fe ff ff <0f> 0b 01
> > d2 44 89 d1 29 d1 ba 00 00 00 00 0f 48 ca e9 28 ff ff ff
> > RSP: 0018:c92b0d08 EFLAGS: 00010297
> > RAX:  RBX: 888102398c80 RCX: 80190009
> > RDX:  RSI: 006a RDI: 
> > RBP: 888102398c00 R08:  R09: 
> > R10: 00ca R11: 00015681 R12: 0001
> > R13: c92b0d68 R14: 8885e000 R15: 8881107aca40
> > FS:  7f41ded69500() GS:888667dc()
> > knlGS:
> > CS:  0010 DS:  ES:  CR0: 80050033
> > CR2: 556ccc2dc1a0 CR3: 000104fd8003 CR4: 00772ef0
> > PKRU: 5554
> > Call Trace:
> >  
> >  ? die+0x32/0x80
> >  ? do_trap+0xd9/0x100
> >  ? dql_completed+0x26b/0x290
> >  ? dql_completed+0x26b/0x290
> >  ? do_error_trap+0x6d/0xb0
> >  ? dql_completed+0x26b/0x290
> >  ? exc_invalid_op+0x4c/0x60
> >  ? dql_completed+0x26b/0x290
> >  ? asm_exc_invalid_op+0x16/0x20
> >  ? dql_completed+0x26b/0x290
> >  __free_old_xmit+0xff/0x170 [virtio_net]
> >  free_old_xmit+0x54/0xc0 [virtio_net]
> >  virtnet_poll+0xf4/0xe30 [virtio_net]
> >  ? __update_load_avg_cfs_rq+0x264/0x2d0
> >  ? update_curr+0x35/0x260
> >  ? reweight_entity+0x1be/0x260
> >  __napi_poll.constprop.0+0x28/0x1c0
> >  net_rx_action+0x329/0x420
> >  ? enqueue_hrtimer+0x35/0x90
> >  ? trace_hardirqs_on+0x1d/0x80
> >  ? kvm_sched_clock_read+0xd/0x20
> >  ? sched_clock+0xc/0x30
> >  ? kvm_sched_clock_read+0xd/0x20
> >  ? sched_clock+0xc/0x30
> >  ? sched_clock_cpu+0xd/0x1a0
> >  handle_softirqs+0x138/0x3e0
> >  do_softirq.part.0+0x89/0xc0
> >  
> >  
> >  __local_bh_enable_ip+0xa7/0xb0
> >  virtnet_open+0xc8/0x310 [virtio_net]
> >  __dev_open+0xfa/0x1b0
> >  __dev_change_flags+0x1de/0x250
> >  dev_change_flags+0x22/0x60
> >  do_setlink.isra.0+0x2df/0x10b0
> >  ? rtnetlink_rcv_msg+0x34f/0x3f0
> >  ? netlink_rcv_skb+0x54/0x100
> >  ? netlink_unicast+0x23e/0x390
> >  ? netlink_sendmsg+0x21e/0x490
> >  ? sys_sendmsg+0x31b/0x350
> >  ? avc_has_perm_noaudit+0x67/0xf0
> >  ? cred_has_capability.isra.0+0x75/0x110
> >  ? __nla_validate_parse+0x5f/0xee0
> >  ? __pfx___probestub_irq_enable+0x3/0x10
> >  ? __create_object+0x5e/0x90
> >  ? security_capable+0x3b/0x70
> >  rtnl_newlink+0x784/0xaf0
> >  ? avc_has_perm_noaudit+0x67/0xf0
> >  ? cred_has_capability.isra.0+0x75/0x110
> >  ? stack_depot_save_flags+0x24/0x6d0
> >  ? __pfx_rtnl_newlink+0x10/0x10
> >  rtnetlink_rcv_msg+0x34f/0x3f0
> >  ? do_syscall_64+0x6c/0x180
> >  ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
> >  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
> >  netlink_rcv_skb+0x54/0x100
> >  netlink_unicast+0x23e/0x390
> >  netlink_sendmsg+0x21e/0x490
> >  sys_sendmsg+0x31b/0x350
> >  ? copy_msghdr_from_user+0x6d/0xa0
> >  ___sys_sendmsg+0x86/0xd0
> >  ? __pte_offset_map+0x17/0x160
> >  ? preempt_count_add+0x69/0xa0
> >  ? __call_rcu_common.constprop.0+0x147/0x610
> >  ? preempt_count_add+0x69/0xa0
> >  ? preempt_count_add+0x69/0xa0
> >  ? _raw_spin_trylock+0x13/0x60
> >  ? trace_hardirqs_on+0x1d/0x80
> >  __sys_sendmsg+0x66/0xc0
> >  do_syscall_64+0x6c/0

[PATCH net-next] virtio_net: drop netdev_tx_reset_queue() from virtnet_enable_queue_pair()

2024-11-30 Thread Koichiro Den
f41defe5b34
RDX:  RSI: 7ffe5336ed30 RDI: 0003
RBP: 7ffe5336eda0 R08: 0010 R09: 0001
R10: 7ffe5336f6f9 R11: 0202 R12: 0003
R13: 67452259 R14: 556ccc28b040 R15: 
 
[...]
---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

Fixes: c8bd1f7f3e61 ("virtio_net: add support for Byte Queue Limits")
Cc:  # v6.11+
Signed-off-by: Koichiro Den 
---
Previous attempt:
https://lore.kernel.org/netdev/20241126024200.2371546-1-koichiro@canonical.com/
---
 drivers/net/virtio_net.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 64c87bb48a41..35a36d1289db 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3054,7 +3054,6 @@ static int virtnet_enable_queue_pair(struct virtnet_info 
*vi, int qp_index)
if (err < 0)
goto err_xdp_reg_mem_model;
 
-   netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, qp_index));
virtnet_napi_enable(vi->rq[qp_index].vq, &vi->rq[qp_index].napi);
virtnet_napi_tx_enable(vi, vi->sq[qp_index].vq, &vi->sq[qp_index].napi);
 
-- 
2.43.0




Re: [PATCH net-next] virtio_net: drop netdev_tx_reset_queue() from virtnet_enable_queue_pair()

2024-12-02 Thread Koichiro Den
On Tue, Dec 03, 2024 at 10:25:14AM +0800, Jason Wang wrote:
> On Tue, Dec 3, 2024 at 10:14 AM Jakub Kicinski  wrote:
> >
> > On Mon, 2 Dec 2024 12:22:53 +0800 Jason Wang wrote:
> > > > Fixes: c8bd1f7f3e61 ("virtio_net: add support for Byte Queue Limits")
> > > > Cc:  # v6.11+
> > > > Signed-off-by: Koichiro Den 
> > >
> > > Acked-by: Jason Wang 
> >
> > I see Tx skb flush in:
> >
> > virtnet_freeze() -> remove_vq_common() -> free_unused_bufs() -> 
> > virtnet_sq_free_unused_buf()
> >
> > do we need to reset the BQL state in that case?
> 
> Yes, I think so. And I spot another path which is:
> 
> virtnet_tx_resize() -> virtqueue_resize() -> virtnet_sq_free_unused_buf().
> 
> > Rule of thumb is netdev_tx_reset_queue() should follow any flush
> > (IOW skb freeing not followed by netdev_tx_completed_queue()).
> >
> 
> Right.
> 
> Koichiro, I think this fixes the problem of open/stop but may break
> freeze/restore(). Let's fix that.
> 
> For resizing, it's a path that has been buggy since the introduction of BQL.
> 
> Thanks
> 

It makes sense, I'll send v2 soon. Thanks for the review.
-Koichiro Den



[PATCH net-next v2 4/5] virtio_ring: add 'flushed' as an argument to virtqueue_reset()

2024-12-02 Thread Koichiro Den
When virtqueue_reset() has actually recycled all unused buffers,
additional work may be required in some cases. Relying solely on its
return status is fragile, so introduce a new argument 'flushed' to
explicitly indicate whether it has really occurred.

Signed-off-by: Koichiro Den 
---
 drivers/net/virtio_net.c | 6 --
 drivers/virtio/virtio_ring.c | 6 +-
 include/linux/virtio.h   | 3 ++-
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 0103d7990e44..d5240a03b7d6 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -5695,6 +5695,7 @@ static int virtnet_rq_bind_xsk_pool(struct virtnet_info 
*vi, struct receive_queu
struct xsk_buff_pool *pool)
 {
int err, qindex;
+   bool flushed;
 
qindex = rq - vi->rq;
 
@@ -5713,7 +5714,7 @@ static int virtnet_rq_bind_xsk_pool(struct virtnet_info 
*vi, struct receive_queu
 
virtnet_rx_pause(vi, rq);
 
-   err = virtqueue_reset(rq->vq, virtnet_rq_unmap_free_buf);
+   err = virtqueue_reset(rq->vq, virtnet_rq_unmap_free_buf, &flushed);
if (err) {
netdev_err(vi->dev, "reset rx fail: rx queue index: %d err: 
%d\n", qindex, err);
 
@@ -5737,12 +5738,13 @@ static int virtnet_sq_bind_xsk_pool(struct virtnet_info 
*vi,
struct xsk_buff_pool *pool)
 {
int err, qindex;
+   bool flushed;
 
qindex = sq - vi->sq;
 
virtnet_tx_pause(vi, sq);
 
-   err = virtqueue_reset(sq->vq, virtnet_sq_free_unused_buf);
+   err = virtqueue_reset(sq->vq, virtnet_sq_free_unused_buf, &flushed);
if (err) {
netdev_err(vi->dev, "reset tx fail: tx queue index: %d err: 
%d\n", qindex, err);
pool = NULL;
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 34a068d401ec..b522ef798946 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2828,6 +2828,7 @@ EXPORT_SYMBOL_GPL(virtqueue_resize);
  * virtqueue_reset - detach and recycle all unused buffers
  * @_vq: the struct virtqueue we're talking about.
  * @recycle: callback to recycle unused buffers
+ * @flushed: whether or not unused buffers are all flushed
  *
  * Caller must ensure we don't call this with other virtqueue operations
  * at the same time (except where noted).
@@ -2839,14 +2840,17 @@ EXPORT_SYMBOL_GPL(virtqueue_resize);
  * -EPERM: Operation not permitted
  */
 int virtqueue_reset(struct virtqueue *_vq,
-   void (*recycle)(struct virtqueue *vq, void *buf))
+   void (*recycle)(struct virtqueue *vq, void *buf),
+   bool *flushed)
 {
struct vring_virtqueue *vq = to_vvq(_vq);
int err;
 
+   *flushed = false;
err = virtqueue_disable_and_recycle(_vq, recycle);
if (err)
return err;
+   *flushed = true;
 
if (vq->packed_ring)
virtqueue_reinit_packed(vq);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 878feda08af9..e5072d64a364 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -112,7 +112,8 @@ int virtqueue_resize(struct virtqueue *vq, u32 num,
 void (*recycle)(struct virtqueue *vq, void *buf),
 bool *flushed);
 int virtqueue_reset(struct virtqueue *vq,
-   void (*recycle)(struct virtqueue *vq, void *buf));
+   void (*recycle)(struct virtqueue *vq, void *buf),
+   bool *flushed);
 
 struct virtio_admin_cmd {
__le16 opcode;
-- 
2.43.0




[PATCH net-next v2 5/5] virtio_net: add missing netdev_tx_reset_queue to virtnet_sq_bind_xsk_pool()

2024-12-02 Thread Koichiro Den
virtnet_sq_bind_xsk_pool() flushes tx skbs and then resets tx queue, so
DQL counters need to be reset.

Fixes: 21a4e3ce6dc7 ("virtio_net: xsk: bind/unbind xsk for tx")
Signed-off-by: Koichiro Den 
---
 drivers/net/virtio_net.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index d5240a03b7d6..27d58fb47b07 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -5749,6 +5749,8 @@ static int virtnet_sq_bind_xsk_pool(struct virtnet_info 
*vi,
netdev_err(vi->dev, "reset tx fail: tx queue index: %d err: 
%d\n", qindex, err);
pool = NULL;
}
+   if (flushed)
+   netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, qindex));
 
sq->xsk_pool = pool;
 
-- 
2.43.0




[PATCH net-next v2 3/5] virtio_net: add missing netdev_tx_reset_queue() to virtnet_tx_resize()

2024-12-02 Thread Koichiro Den
virtnet_tx_resize() flushes remaining tx skbs, so DQL counters need to
be reset.

Fixes: c8bd1f7f3e61 ("virtio_net: add support for Byte Queue Limits")
Cc:  # v6.11+
Signed-off-by: Koichiro Den 
---
 drivers/net/virtio_net.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index df9bfe31aa6d..0103d7990e44 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3399,6 +3399,8 @@ static int virtnet_tx_resize(struct virtnet_info *vi, 
struct send_queue *sq,
err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf, 
&flushed);
if (err)
netdev_err(vi->dev, "resize tx fail: tx queue index: %d err: 
%d\n", qindex, err);
+   if (flushed)
+   netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, qindex));
 
virtnet_tx_resume(vi, sq);
 
-- 
2.43.0




[PATCH net-next v2 0/5] virtio_net: correct netdev_tx_reset_queue() invocation points

2024-12-02 Thread Koichiro Den
 do_syscall_64+0x6c/0x180
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f41defe5b34
Code: 15 e1 12 0f 00 f7 d8 64 89 02 b8 ff ff ff ff eb bf 0f 1f 44 00 00
f3 0f 1e fa 80 3d 35 95 0f 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00
f0 ff ff 77 4c c3 0f 1f 00 55 48 89 e5 48 83 ec 20 89 55
RSP: 002b:7ffe5336ecc8 EFLAGS: 0202 ORIG_RAX: 002e
RAX: ffda RBX: 0003 RCX: 7f41defe5b34
RDX:  RSI: 7ffe5336ed30 RDI: 0003
RBP: 7ffe5336eda0 R08: 0010 R09: 0001
R10: 7ffe5336f6f9 R11: 0202 R12: 0003
R13: 67452259 R14: 556ccc28b040 R15: 
 
[...]
---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---


Koichiro Den (5):
  virtio_net: correct netdev_tx_reset_queue() invocation point
  virtio_ring: add 'flushed' as an argument to virtqueue_resize()
  virtio_net: add missing netdev_tx_reset_queue() to virtnet_tx_resize()
  virtio_ring: add 'flushed' as an argument to virtqueue_reset()
  virtio_net: add missing netdev_tx_reset_queue to
virtnet_sq_bind_xsk_pool()

 drivers/net/virtio_net.c | 18 +-
 drivers/virtio/virtio_ring.c | 13 +++--
 include/linux/virtio.h   |  6 --
 3 files changed, 28 insertions(+), 9 deletions(-)

-- 
2.43.0




[PATCH net-next v2 1/5] virtio_net: correct netdev_tx_reset_queue() invocation point

2024-12-02 Thread Koichiro Den
fda RBX: 0003 RCX: 7f41defe5b34
RDX:  RSI: 7ffe5336ed30 RDI: 0003
RBP: 7ffe5336eda0 R08: 0010 R09: 0001
R10: 7ffe5336f6f9 R11: 0202 R12: 0003
R13: 67452259 R14: 556ccc28b040 R15: 
 
[...]
---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

Fixes: c8bd1f7f3e61 ("virtio_net: add support for Byte Queue Limits")
Cc:  # v6.11+
Signed-off-by: Koichiro Den 
---
 drivers/net/virtio_net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 64c87bb48a41..48ce8b3881b6 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3054,7 +3054,6 @@ static int virtnet_enable_queue_pair(struct virtnet_info 
*vi, int qp_index)
if (err < 0)
goto err_xdp_reg_mem_model;
 
-   netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, qp_index));
virtnet_napi_enable(vi->rq[qp_index].vq, &vi->rq[qp_index].napi);
virtnet_napi_tx_enable(vi, vi->sq[qp_index].vq, &vi->sq[qp_index].napi);
 
@@ -6243,6 +6242,7 @@ static void free_unused_bufs(struct virtnet_info *vi)
struct virtqueue *vq = vi->sq[i].vq;
while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
virtnet_sq_free_unused_buf(vq, buf);
+   netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, i));
cond_resched();
}
 
-- 
2.43.0




[PATCH net-next v2 2/5] virtio_ring: add 'flushed' as an argument to virtqueue_resize()

2024-12-02 Thread Koichiro Den
When virtqueue_resize() has actually recycled all unused buffers,
additional work may be required in some cases. Relying solely on its
return status is fragile, so introduce a new argument 'flushed' to
explicitly indicate whether it has really occurred.

Signed-off-by: Koichiro Den 
---
 drivers/net/virtio_net.c | 6 --
 drivers/virtio/virtio_ring.c | 7 ++-
 include/linux/virtio.h   | 3 ++-
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 48ce8b3881b6..df9bfe31aa6d 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3326,12 +3326,13 @@ static int virtnet_rx_resize(struct virtnet_info *vi,
 struct receive_queue *rq, u32 ring_num)
 {
int err, qindex;
+   bool flushed;
 
qindex = rq - vi->rq;
 
virtnet_rx_pause(vi, rq);
 
-   err = virtqueue_resize(rq->vq, ring_num, virtnet_rq_unmap_free_buf);
+   err = virtqueue_resize(rq->vq, ring_num, virtnet_rq_unmap_free_buf, 
&flushed);
if (err)
netdev_err(vi->dev, "resize rx fail: rx queue index: %d err: 
%d\n", qindex, err);
 
@@ -3389,12 +3390,13 @@ static int virtnet_tx_resize(struct virtnet_info *vi, 
struct send_queue *sq,
 u32 ring_num)
 {
int qindex, err;
+   bool flushed;
 
qindex = sq - vi->sq;
 
virtnet_tx_pause(vi, sq);
 
-   err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf);
+   err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf, 
&flushed);
if (err)
netdev_err(vi->dev, "resize tx fail: tx queue index: %d err: 
%d\n", qindex, err);
 
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 82a7d2cbc704..34a068d401ec 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2772,6 +2772,7 @@ EXPORT_SYMBOL_GPL(vring_create_virtqueue_dma);
  * @_vq: the struct virtqueue we're talking about.
  * @num: new ring num
  * @recycle: callback to recycle unused buffers
+ * @flushed: whether or not unused buffers are all flushed
  *
  * When it is really necessary to create a new vring, it will set the current 
vq
  * into the reset state. Then call the passed callback to recycle the buffer
@@ -2792,11 +2793,14 @@ EXPORT_SYMBOL_GPL(vring_create_virtqueue_dma);
  *
  */
 int virtqueue_resize(struct virtqueue *_vq, u32 num,
-void (*recycle)(struct virtqueue *vq, void *buf))
+void (*recycle)(struct virtqueue *vq, void *buf),
+bool *flushed)
 {
struct vring_virtqueue *vq = to_vvq(_vq);
int err;
 
+   *flushed = false;
+
if (num > vq->vq.num_max)
return -E2BIG;
 
@@ -2809,6 +2813,7 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
err = virtqueue_disable_and_recycle(_vq, recycle);
if (err)
return err;
+   *flushed = true;
 
if (vq->packed_ring)
err = virtqueue_resize_packed(_vq, num);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 57cc4b07fd17..878feda08af9 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -109,7 +109,8 @@ dma_addr_t virtqueue_get_avail_addr(const struct virtqueue 
*vq);
 dma_addr_t virtqueue_get_used_addr(const struct virtqueue *vq);
 
 int virtqueue_resize(struct virtqueue *vq, u32 num,
-void (*recycle)(struct virtqueue *vq, void *buf));
+void (*recycle)(struct virtqueue *vq, void *buf),
+bool *flushed);
 int virtqueue_reset(struct virtqueue *vq,
void (*recycle)(struct virtqueue *vq, void *buf));
 
-- 
2.43.0




Re: [PATCH net-next v3 1/7] virtio_net: correct netdev_tx_reset_queue() invocation point

2024-12-05 Thread Koichiro Den
On Thu, Dec 05, 2024 at 05:33:36AM -0500, Michael S. Tsirkin wrote:
> On Wed, Dec 04, 2024 at 02:07:18PM +0900, Koichiro Den wrote:
> > When virtnet_close is followed by virtnet_open, some TX completions can
> > possibly remain unconsumed, until they are finally processed during the
> > first NAPI poll after the netdev_tx_reset_queue(), resulting in a crash
> > [1].
> 
> 
> So it's a bugfix. Why net-next not net?

I was mistaken (I just read netdev-FAQ again). I'll resend to net, with
adjustments reflecting your feedback.

> 
> > Commit b96ed2c97c79 ("virtio_net: move netdev_tx_reset_queue() call
> > before RX napi enable") was not sufficient to eliminate all BQL crash
> > cases for virtio-net.
> > 
> > This issue can be reproduced with the latest net-next master by running:
> > `while :; do ip l set DEV down; ip l set DEV up; done` under heavy network
> > TX load from inside the machine.
> > 
> > netdev_tx_reset_queue() can actually be dropped from virtnet_open path;
> > the device is not stopped in any case. For BQL core part, it's just like
> > traffic nearly ceases to exist for some period. For stall detector added
> > to BQL, even if virtnet_close could somehow lead to some TX completions
> > delayed for long, followed by virtnet_open, we can just take it as stall
> > as mentioned in commit 6025b9135f7a ("net: dqs: add NIC stall detector
> > based on BQL"). Note also that users can still reset stall_max via sysfs.
> > 
> > So, drop netdev_tx_reset_queue() from virtnet_enable_queue_pair(). This
> > eliminates the BQL crashes. Note that netdev_tx_reset_queue() is now
> > explicitly required in freeze/restore path, so this patch adds it to
> > free_unused_bufs().
> 
> I don't much like that free_unused_bufs now has this side effect.
> I think would be better to just add a loop in virtnet_restore.
> Or if you want to keep it there, pls rename the function
> to hint it does more.

It makes sense. I would go for the former. Thanks.

> 
> 
> > 
> > [1]:
> > [ cut here ]
> > kernel BUG at lib/dynamic_queue_limits.c:99!
> > Oops: invalid opcode:  [#1] PREEMPT SMP NOPTI
> > CPU: 7 UID: 0 PID: 1598 Comm: ip Tainted: GN 6.12.0net-next_main+ #2
> > Tainted: [N]=TEST
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), \
> > BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> > RIP: 0010:dql_completed+0x26b/0x290
> > Code: b7 c2 49 89 e9 44 89 da 89 c6 4c 89 d7 e8 ed 17 47 00 58 65 ff 0d
> > 4d 27 90 7e 0f 85 fd fe ff ff e8 ea 53 8d ff e9 f3 fe ff ff <0f> 0b 01
> > d2 44 89 d1 29 d1 ba 00 00 00 00 0f 48 ca e9 28 ff ff ff
> > RSP: 0018:c92b0d08 EFLAGS: 00010297
> > RAX:  RBX: 888102398c80 RCX: 80190009
> > RDX:  RSI: 006a RDI: 
> > RBP: 888102398c00 R08:  R09: 
> > R10: 00ca R11: 00015681 R12: 0001
> > R13: c92b0d68 R14: 8885e000 R15: 8881107aca40
> > FS:  7f41ded69500() GS:888667dc()
> > knlGS:
> > CS:  0010 DS:  ES:  CR0: 80050033
> > CR2: 556ccc2dc1a0 CR3: 000104fd8003 CR4: 00772ef0
> > PKRU: 5554
> > Call Trace:
> >  
> >  ? die+0x32/0x80
> >  ? do_trap+0xd9/0x100
> >  ? dql_completed+0x26b/0x290
> >  ? dql_completed+0x26b/0x290
> >  ? do_error_trap+0x6d/0xb0
> >  ? dql_completed+0x26b/0x290
> >  ? exc_invalid_op+0x4c/0x60
> >  ? dql_completed+0x26b/0x290
> >  ? asm_exc_invalid_op+0x16/0x20
> >  ? dql_completed+0x26b/0x290
> >  __free_old_xmit+0xff/0x170 [virtio_net]
> >  free_old_xmit+0x54/0xc0 [virtio_net]
> >  virtnet_poll+0xf4/0xe30 [virtio_net]
> >  ? __update_load_avg_cfs_rq+0x264/0x2d0
> >  ? update_curr+0x35/0x260
> >  ? reweight_entity+0x1be/0x260
> >  __napi_poll.constprop.0+0x28/0x1c0
> >  net_rx_action+0x329/0x420
> >  ? enqueue_hrtimer+0x35/0x90
> >  ? trace_hardirqs_on+0x1d/0x80
> >  ? kvm_sched_clock_read+0xd/0x20
> >  ? sched_clock+0xc/0x30
> >  ? kvm_sched_clock_read+0xd/0x20
> >  ? sched_clock+0xc/0x30
> >  ? sched_clock_cpu+0xd/0x1a0
> >  handle_softirqs+0x138/0x3e0
> >  do_softirq.part.0+0x89/0xc0
> >  
> >  
> >  __local_bh_enable_ip+0xa7/0xb0
> >  virtnet_open+0xc8/0x310 [virtio_net]
> >  __dev_open+0xfa/0x1b0
> >  __dev_change_flags+0x1de/0x250
> >  dev_change_flags+0x22/0x60
> >  do_setlink.isra.0+0x2df/0x10b0
> &g

Re: [PATCH net-next v3 3/7] virtio_net: introduce virtnet_sq_free_unused_buf_done()

2024-12-05 Thread Koichiro Den
On Thu, Dec 05, 2024 at 05:40:33AM -0500, Michael S. Tsirkin wrote:
> On Wed, Dec 04, 2024 at 02:07:20PM +0900, Koichiro Den wrote:
> > This will be used in the following commits, to ensure DQL reset occurs
> > iff. all unused buffers are actually recycled.
> > 
> > Cc:  # v6.11+
> > Signed-off-by: Koichiro Den 
> 
> to avoid adding an unused function, squash with a patch that uses it.

I originally seperated this out because some were supposed to land stable
tree while others not, and this was the common prerequisite. However, this
can be squahsed with [5/7] regardless of that, and should be done so as you
pointed out.

I'll do so and send v4 later, thanks for the review.

> 
> 
> > ---
> >  drivers/net/virtio_net.c | 9 +
> >  1 file changed, 9 insertions(+)
> > 
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 1b7a85e75e14..b3cbbd8052e4 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -503,6 +503,7 @@ struct virtio_net_common_hdr {
> >  static struct virtio_net_common_hdr xsk_hdr;
> >  
> >  static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf);
> > +static void virtnet_sq_free_unused_buf_done(struct virtqueue *vq);
> >  static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct xdp_buff 
> > *xdp,
> >struct net_device *dev,
> >unsigned int *xdp_xmit,
> > @@ -6233,6 +6234,14 @@ static void virtnet_sq_free_unused_buf(struct 
> > virtqueue *vq, void *buf)
> > }
> >  }
> >  
> > +static void virtnet_sq_free_unused_buf_done(struct virtqueue *vq)
> > +{
> > +   struct virtnet_info *vi = vq->vdev->priv;
> > +   int i = vq2txq(vq);
> > +
> > +   netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, i));
> > +}
> > +
> >  static void free_unused_bufs(struct virtnet_info *vi)
> >  {
> > void *buf;
> > -- 
> > 2.43.0
> 



[PATCH net-next v3 5/7] virtio_net: ensure netdev_tx_reset_queue is called on tx ring resize

2024-12-03 Thread Koichiro Den
virtnet_tx_resize() flushes remaining tx skbs, requiring DQL counters to
be reset when flushing has actually occurred. Add
virtnet_sq_free_unused_buf_done() as a callback for virtqueue_reset() to
handle this.

Fixes: c8bd1f7f3e61 ("virtio_net: add support for Byte Queue Limits")
Cc:  # v6.11+
Signed-off-by: Koichiro Den 
---
 drivers/net/virtio_net.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 2a90655cfa4f..d0cf29fd8255 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3395,7 +3395,8 @@ static int virtnet_tx_resize(struct virtnet_info *vi, 
struct send_queue *sq,
 
virtnet_tx_pause(vi, sq);
 
-   err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf, 
NULL);
+   err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf,
+  virtnet_sq_free_unused_buf_done);
if (err)
netdev_err(vi->dev, "resize tx fail: tx queue index: %d err: 
%d\n", qindex, err);
 
-- 
2.43.0




[PATCH net-next v3 7/7] virtio_net: ensure netdev_tx_reset_queue is called on bind xsk for tx

2024-12-03 Thread Koichiro Den
virtnet_sq_bind_xsk_pool() flushes tx skbs and then resets tx queue, so
DQL counters need to be reset when flushing has actually occurred, Add
virtnet_sq_free_unused_buf_done() as a callback for virtqueue_resize()
to handle this.

Fixes: 21a4e3ce6dc7 ("virtio_net: xsk: bind/unbind xsk for tx")
Signed-off-by: Koichiro Den 
---
 drivers/net/virtio_net.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 5eaa7a2884d5..177705a56812 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -5740,7 +5740,8 @@ static int virtnet_sq_bind_xsk_pool(struct virtnet_info 
*vi,
 
virtnet_tx_pause(vi, sq);
 
-   err = virtqueue_reset(sq->vq, virtnet_sq_free_unused_buf, NULL);
+   err = virtqueue_reset(sq->vq, virtnet_sq_free_unused_buf,
+ virtnet_sq_free_unused_buf_done);
if (err) {
netdev_err(vi->dev, "reset tx fail: tx queue index: %d err: 
%d\n", qindex, err);
pool = NULL;
-- 
2.43.0




Re: [PATCH net-next v2 4/5] virtio_ring: add 'flushed' as an argument to virtqueue_reset()

2024-12-03 Thread Koichiro Den
On Wed, Dec 04, 2024 at 10:49:02AM +0800, Jason Wang wrote:
> On Tue, Dec 3, 2024 at 3:31 PM Koichiro Den  
> wrote:
> >
> > When virtqueue_reset() has actually recycled all unused buffers,
> > additional work may be required in some cases. Relying solely on its
> > return status is fragile, so introduce a new argument 'flushed' to
> > explicitly indicate whether it has really occurred.
> >
> > Signed-off-by: Koichiro Den 
> > ---
> >  drivers/net/virtio_net.c | 6 --
> >  drivers/virtio/virtio_ring.c | 6 +-
> >  include/linux/virtio.h   | 3 ++-
> >  3 files changed, 11 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 0103d7990e44..d5240a03b7d6 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -5695,6 +5695,7 @@ static int virtnet_rq_bind_xsk_pool(struct 
> > virtnet_info *vi, struct receive_queu
> > struct xsk_buff_pool *pool)
> >  {
> > int err, qindex;
> > +   bool flushed;
> >
> > qindex = rq - vi->rq;
> >
> > @@ -5713,7 +5714,7 @@ static int virtnet_rq_bind_xsk_pool(struct 
> > virtnet_info *vi, struct receive_queu
> >
> > virtnet_rx_pause(vi, rq);
> >
> > -   err = virtqueue_reset(rq->vq, virtnet_rq_unmap_free_buf);
> > +   err = virtqueue_reset(rq->vq, virtnet_rq_unmap_free_buf, &flushed);
> > if (err) {
> > netdev_err(vi->dev, "reset rx fail: rx queue index: %d err: 
> > %d\n", qindex, err);
> >
> > @@ -5737,12 +5738,13 @@ static int virtnet_sq_bind_xsk_pool(struct 
> > virtnet_info *vi,
> > struct xsk_buff_pool *pool)
> >  {
> > int err, qindex;
> > +   bool flushed;
> >
> > qindex = sq - vi->sq;
> >
> > virtnet_tx_pause(vi, sq);
> >
> > -   err = virtqueue_reset(sq->vq, virtnet_sq_free_unused_buf);
> > +   err = virtqueue_reset(sq->vq, virtnet_sq_free_unused_buf, &flushed);
> > if (err) {
> > netdev_err(vi->dev, "reset tx fail: tx queue index: %d err: 
> > %d\n", qindex, err);
> > pool = NULL;
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 34a068d401ec..b522ef798946 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -2828,6 +2828,7 @@ EXPORT_SYMBOL_GPL(virtqueue_resize);
> >   * virtqueue_reset - detach and recycle all unused buffers
> >   * @_vq: the struct virtqueue we're talking about.
> >   * @recycle: callback to recycle unused buffers
> > + * @flushed: whether or not unused buffers are all flushed
> >   *
> >   * Caller must ensure we don't call this with other virtqueue operations
> >   * at the same time (except where noted).
> > @@ -2839,14 +2840,17 @@ EXPORT_SYMBOL_GPL(virtqueue_resize);
> >   * -EPERM: Operation not permitted
> >   */
> >  int virtqueue_reset(struct virtqueue *_vq,
> > -   void (*recycle)(struct virtqueue *vq, void *buf))
> > +   void (*recycle)(struct virtqueue *vq, void *buf),
> > +   bool *flushed)
> >  {
> > struct vring_virtqueue *vq = to_vvq(_vq);
> > int err;
> >
> > +   *flushed = false;
> > err = virtqueue_disable_and_recycle(_vq, recycle);
> > if (err)
> > return err;
> > +   *flushed = true;
> >
> 
> This makes me think if it would be easier if we just find a way to
> reset the tx queue inside virtqueue_disable_and_recycle().
> 
> For example, introducing a recycle_done callback?

It sounds reasonable and much cleaner. I'll prepare and send v3 shortly.
Thanks for the review.

-Koichiro Den

> 
> Thanks
> 



[PATCH net-next v3 1/7] virtio_net: correct netdev_tx_reset_queue() invocation point

2024-12-03 Thread Koichiro Den
fda RBX: 0003 RCX: 7f41defe5b34
RDX:  RSI: 7ffe5336ed30 RDI: 0003
RBP: 7ffe5336eda0 R08: 0010 R09: 0001
R10: 7ffe5336f6f9 R11: 0202 R12: 0003
R13: 67452259 R14: 556ccc28b040 R15: 
 
[...]
---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

Fixes: c8bd1f7f3e61 ("virtio_net: add support for Byte Queue Limits")
Cc:  # v6.11+
Signed-off-by: Koichiro Den 
---
 drivers/net/virtio_net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 64c87bb48a41..48ce8b3881b6 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3054,7 +3054,6 @@ static int virtnet_enable_queue_pair(struct virtnet_info 
*vi, int qp_index)
if (err < 0)
goto err_xdp_reg_mem_model;
 
-   netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, qp_index));
virtnet_napi_enable(vi->rq[qp_index].vq, &vi->rq[qp_index].napi);
virtnet_napi_tx_enable(vi, vi->sq[qp_index].vq, &vi->sq[qp_index].napi);
 
@@ -6243,6 +6242,7 @@ static void free_unused_bufs(struct virtnet_info *vi)
struct virtqueue *vq = vi->sq[i].vq;
while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
virtnet_sq_free_unused_buf(vq, buf);
+   netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, i));
cond_resched();
}
 
-- 
2.43.0




[PATCH net-next v3 0/7] virtio_net: correct netdev_tx_reset_queue() invocation points

2024-12-03 Thread Koichiro Den
ndmsg+0x86/0xd0
 ? __pte_offset_map+0x17/0x160
 ? preempt_count_add+0x69/0xa0
 ? __call_rcu_common.constprop.0+0x147/0x610
 ? preempt_count_add+0x69/0xa0
 ? preempt_count_add+0x69/0xa0
 ? _raw_spin_trylock+0x13/0x60
 ? trace_hardirqs_on+0x1d/0x80
 __sys_sendmsg+0x66/0xc0
 do_syscall_64+0x6c/0x180
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f41defe5b34
Code: 15 e1 12 0f 00 f7 d8 64 89 02 b8 ff ff ff ff eb bf 0f 1f 44 00 00
f3 0f 1e fa 80 3d 35 95 0f 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00
f0 ff ff 77 4c c3 0f 1f 00 55 48 89 e5 48 83 ec 20 89 55
RSP: 002b:7ffe5336ecc8 EFLAGS: 0202 ORIG_RAX: 002e
RAX: ffda RBX: 0003 RCX: 7f41defe5b34
RDX:  RSI: 7ffe5336ed30 RDI: 0003
RBP: 7ffe5336eda0 R08: 0010 R09: 0001
R10: 7ffe5336f6f9 R11: 0202 R12: 0003
R13: 67452259 R14: 556ccc28b040 R15: 
 
[...]
---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---


Koichiro Den (7):
  virtio_net: correct netdev_tx_reset_queue() invocation point
  virtio_ring: add a func argument 'recycle_done' to virtqueue_resize()
  virtio_net: replace vq2rxq with vq2txq where appropriate
  virtio_net: introduce virtnet_sq_free_unused_buf_done()
  virtio_net: ensure netdev_tx_reset_queue is called on tx ring resize
  virtio_ring: add a func argument 'recycle_done' to virtqueue_reset()
  virtio_net: ensure netdev_tx_reset_queue is called on bind xsk for tx

 drivers/net/virtio_net.c | 23 +--
 drivers/virtio/virtio_ring.c | 12 ++--
 include/linux/virtio.h   |  6 --
 3 files changed, 31 insertions(+), 10 deletions(-)

-- 
2.43.0




[PATCH net-next v3 2/7] virtio_net: replace vq2rxq with vq2txq where appropriate

2024-12-03 Thread Koichiro Den
While not harmful, using vq2rxq where it's always sq appears odd.
Replace it with the more appropriate vq2txq for clarity and correctness.

Fixes: 89f86675cb03 ("virtio_net: xsk: tx: support xmit xsk buffer")
Signed-off-by: Koichiro Den 
---
 drivers/net/virtio_net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 48ce8b3881b6..1b7a85e75e14 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -6213,7 +6213,7 @@ static void virtnet_sq_free_unused_buf(struct virtqueue 
*vq, void *buf)
 {
struct virtnet_info *vi = vq->vdev->priv;
struct send_queue *sq;
-   int i = vq2rxq(vq);
+   int i = vq2txq(vq);
 
sq = &vi->sq[i];
 
-- 
2.43.0




[PATCH net-next v3 4/7] virtio_ring: add a func argument 'recycle_done' to virtqueue_resize()

2024-12-03 Thread Koichiro Den
When virtqueue_resize() has actually recycled all unused buffers,
additional work may be required in some cases. Relying solely on its
return status is fragile, so introduce a new function argument
'recycle_done', which is invoked when the recycle really occurs.

Cc:  # v6.11+
Signed-off-by: Koichiro Den 
---
 drivers/net/virtio_net.c | 4 ++--
 drivers/virtio/virtio_ring.c | 6 +-
 include/linux/virtio.h   | 3 ++-
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index b3cbbd8052e4..2a90655cfa4f 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3332,7 +3332,7 @@ static int virtnet_rx_resize(struct virtnet_info *vi,
 
virtnet_rx_pause(vi, rq);
 
-   err = virtqueue_resize(rq->vq, ring_num, virtnet_rq_unmap_free_buf);
+   err = virtqueue_resize(rq->vq, ring_num, virtnet_rq_unmap_free_buf, 
NULL);
if (err)
netdev_err(vi->dev, "resize rx fail: rx queue index: %d err: 
%d\n", qindex, err);
 
@@ -3395,7 +3395,7 @@ static int virtnet_tx_resize(struct virtnet_info *vi, 
struct send_queue *sq,
 
virtnet_tx_pause(vi, sq);
 
-   err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf);
+   err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf, 
NULL);
if (err)
netdev_err(vi->dev, "resize tx fail: tx queue index: %d err: 
%d\n", qindex, err);
 
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 82a7d2cbc704..6af8cf6a619e 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2772,6 +2772,7 @@ EXPORT_SYMBOL_GPL(vring_create_virtqueue_dma);
  * @_vq: the struct virtqueue we're talking about.
  * @num: new ring num
  * @recycle: callback to recycle unused buffers
+ * @recycle_done: callback to be invoked when recycle for all unused buffers 
done
  *
  * When it is really necessary to create a new vring, it will set the current 
vq
  * into the reset state. Then call the passed callback to recycle the buffer
@@ -2792,7 +2793,8 @@ EXPORT_SYMBOL_GPL(vring_create_virtqueue_dma);
  *
  */
 int virtqueue_resize(struct virtqueue *_vq, u32 num,
-void (*recycle)(struct virtqueue *vq, void *buf))
+void (*recycle)(struct virtqueue *vq, void *buf),
+void (*recycle_done)(struct virtqueue *vq))
 {
struct vring_virtqueue *vq = to_vvq(_vq);
int err;
@@ -2809,6 +2811,8 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
err = virtqueue_disable_and_recycle(_vq, recycle);
if (err)
return err;
+   if (recycle_done)
+   recycle_done(_vq);
 
if (vq->packed_ring)
err = virtqueue_resize_packed(_vq, num);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 57cc4b07fd17..0aa7df4ed5ca 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -109,7 +109,8 @@ dma_addr_t virtqueue_get_avail_addr(const struct virtqueue 
*vq);
 dma_addr_t virtqueue_get_used_addr(const struct virtqueue *vq);
 
 int virtqueue_resize(struct virtqueue *vq, u32 num,
-void (*recycle)(struct virtqueue *vq, void *buf));
+void (*recycle)(struct virtqueue *vq, void *buf),
+void (*recycle_done)(struct virtqueue *vq));
 int virtqueue_reset(struct virtqueue *vq,
void (*recycle)(struct virtqueue *vq, void *buf));
 
-- 
2.43.0




[PATCH net-next v3 6/7] virtio_ring: add a func argument 'recycle_done' to virtqueue_reset()

2024-12-03 Thread Koichiro Den
When virtqueue_reset() has actually recycled all unused buffers,
additional work may be required in some cases. Relying solely on its
return status is fragile, so introduce a new function argument
'recycle_done', which is invoked when it really occurs.

Signed-off-by: Koichiro Den 
---
 drivers/net/virtio_net.c | 4 ++--
 drivers/virtio/virtio_ring.c | 6 +-
 include/linux/virtio.h   | 3 ++-
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index d0cf29fd8255..5eaa7a2884d5 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -5711,7 +5711,7 @@ static int virtnet_rq_bind_xsk_pool(struct virtnet_info 
*vi, struct receive_queu
 
virtnet_rx_pause(vi, rq);
 
-   err = virtqueue_reset(rq->vq, virtnet_rq_unmap_free_buf);
+   err = virtqueue_reset(rq->vq, virtnet_rq_unmap_free_buf, NULL);
if (err) {
netdev_err(vi->dev, "reset rx fail: rx queue index: %d err: 
%d\n", qindex, err);
 
@@ -5740,7 +5740,7 @@ static int virtnet_sq_bind_xsk_pool(struct virtnet_info 
*vi,
 
virtnet_tx_pause(vi, sq);
 
-   err = virtqueue_reset(sq->vq, virtnet_sq_free_unused_buf);
+   err = virtqueue_reset(sq->vq, virtnet_sq_free_unused_buf, NULL);
if (err) {
netdev_err(vi->dev, "reset tx fail: tx queue index: %d err: 
%d\n", qindex, err);
pool = NULL;
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 6af8cf6a619e..fdd2d2b07b5a 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2827,6 +2827,7 @@ EXPORT_SYMBOL_GPL(virtqueue_resize);
  * virtqueue_reset - detach and recycle all unused buffers
  * @_vq: the struct virtqueue we're talking about.
  * @recycle: callback to recycle unused buffers
+ * @recycle_done: callback to be invoked when recycle for all unused buffers 
done
  *
  * Caller must ensure we don't call this with other virtqueue operations
  * at the same time (except where noted).
@@ -2838,7 +2839,8 @@ EXPORT_SYMBOL_GPL(virtqueue_resize);
  * -EPERM: Operation not permitted
  */
 int virtqueue_reset(struct virtqueue *_vq,
-   void (*recycle)(struct virtqueue *vq, void *buf))
+   void (*recycle)(struct virtqueue *vq, void *buf),
+   void (*recycle_done)(struct virtqueue *vq))
 {
struct vring_virtqueue *vq = to_vvq(_vq);
int err;
@@ -2846,6 +2848,8 @@ int virtqueue_reset(struct virtqueue *_vq,
err = virtqueue_disable_and_recycle(_vq, recycle);
if (err)
return err;
+   if (recycle_done)
+   recycle_done(_vq);
 
if (vq->packed_ring)
virtqueue_reinit_packed(vq);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 0aa7df4ed5ca..dd88682e27e3 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -112,7 +112,8 @@ int virtqueue_resize(struct virtqueue *vq, u32 num,
 void (*recycle)(struct virtqueue *vq, void *buf),
 void (*recycle_done)(struct virtqueue *vq));
 int virtqueue_reset(struct virtqueue *vq,
-   void (*recycle)(struct virtqueue *vq, void *buf));
+   void (*recycle)(struct virtqueue *vq, void *buf),
+   void (*recycle_done)(struct virtqueue *vq));
 
 struct virtio_admin_cmd {
__le16 opcode;
-- 
2.43.0




[PATCH net-next v3 3/7] virtio_net: introduce virtnet_sq_free_unused_buf_done()

2024-12-03 Thread Koichiro Den
This will be used in the following commits, to ensure DQL reset occurs
iff. all unused buffers are actually recycled.

Cc:  # v6.11+
Signed-off-by: Koichiro Den 
---
 drivers/net/virtio_net.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 1b7a85e75e14..b3cbbd8052e4 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -503,6 +503,7 @@ struct virtio_net_common_hdr {
 static struct virtio_net_common_hdr xsk_hdr;
 
 static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf);
+static void virtnet_sq_free_unused_buf_done(struct virtqueue *vq);
 static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct xdp_buff *xdp,
   struct net_device *dev,
   unsigned int *xdp_xmit,
@@ -6233,6 +6234,14 @@ static void virtnet_sq_free_unused_buf(struct virtqueue 
*vq, void *buf)
}
 }
 
+static void virtnet_sq_free_unused_buf_done(struct virtqueue *vq)
+{
+   struct virtnet_info *vi = vq->vdev->priv;
+   int i = vq2txq(vq);
+
+   netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, i));
+}
+
 static void free_unused_bufs(struct virtnet_info *vi)
 {
void *buf;
-- 
2.43.0




Re: [PATCH net-next v3 1/7] virtio_net: correct netdev_tx_reset_queue() invocation point

2024-12-05 Thread Koichiro Den
On Thu, Dec 05, 2024 at 09:43:38PM +0900, Koichiro Den wrote:
> On Thu, Dec 05, 2024 at 05:33:36AM -0500, Michael S. Tsirkin wrote:
> > On Wed, Dec 04, 2024 at 02:07:18PM +0900, Koichiro Den wrote:
> > > When virtnet_close is followed by virtnet_open, some TX completions can
> > > possibly remain unconsumed, until they are finally processed during the
> > > first NAPI poll after the netdev_tx_reset_queue(), resulting in a crash
> > > [1].
> > 
> > 
> > So it's a bugfix. Why net-next not net?
> 
> I was mistaken (I just read netdev-FAQ again). I'll resend to net, with
> adjustments reflecting your feedback.
> 
> > 
> > > Commit b96ed2c97c79 ("virtio_net: move netdev_tx_reset_queue() call
> > > before RX napi enable") was not sufficient to eliminate all BQL crash
> > > cases for virtio-net.
> > > 
> > > This issue can be reproduced with the latest net-next master by running:
> > > `while :; do ip l set DEV down; ip l set DEV up; done` under heavy network
> > > TX load from inside the machine.
> > > 
> > > netdev_tx_reset_queue() can actually be dropped from virtnet_open path;
> > > the device is not stopped in any case. For BQL core part, it's just like
> > > traffic nearly ceases to exist for some period. For stall detector added
> > > to BQL, even if virtnet_close could somehow lead to some TX completions
> > > delayed for long, followed by virtnet_open, we can just take it as stall
> > > as mentioned in commit 6025b9135f7a ("net: dqs: add NIC stall detector
> > > based on BQL"). Note also that users can still reset stall_max via sysfs.
> > > 
> > > So, drop netdev_tx_reset_queue() from virtnet_enable_queue_pair(). This
> > > eliminates the BQL crashes. Note that netdev_tx_reset_queue() is now
> > > explicitly required in freeze/restore path, so this patch adds it to
> > > free_unused_bufs().
> > 
> > I don't much like that free_unused_bufs now has this side effect.
> > I think would be better to just add a loop in virtnet_restore.
> > Or if you want to keep it there, pls rename the function
> > to hint it does more.
> 
> It makes sense. I would go for the former. Thanks.

Hmm, as Jacob pointed out in v1
(https://lore.kernel.org/all/20241202181445.0da50...@kernel.org/),
it looks better to follow the rule of thumb. Taking both suggestions
from Jacob and you, adding a loop to remove_vq_common(), just after
free_unused_bufs(), seems more fitting now, like this:

 static void remove_vq_common(struct virtnet_info *vi)
 {
+   int i;
+
virtio_reset_device(vi->vdev);

/* Free unused buffers in both send and recv, if any. */
free_unused_bufs(vi);

+   /* Tx unused buffers flushed, so reset BQL counter */
+   for (i = 0; i < vi->max_queue_pairs; i++)
+   netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, i));
+
free_receive_bufs(vi);

What do you think?

Thanks,

-Koichiro Den

> 
> > 
> > 
> > > 
> > > [1]:
> > > [ cut here ]
> > > kernel BUG at lib/dynamic_queue_limits.c:99!
> > > Oops: invalid opcode:  [#1] PREEMPT SMP NOPTI
> > > CPU: 7 UID: 0 PID: 1598 Comm: ip Tainted: GN 6.12.0net-next_main+ #2
> > > Tainted: [N]=TEST
> > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), \
> > > BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> > > RIP: 0010:dql_completed+0x26b/0x290
> > > Code: b7 c2 49 89 e9 44 89 da 89 c6 4c 89 d7 e8 ed 17 47 00 58 65 ff 0d
> > > 4d 27 90 7e 0f 85 fd fe ff ff e8 ea 53 8d ff e9 f3 fe ff ff <0f> 0b 01
> > > d2 44 89 d1 29 d1 ba 00 00 00 00 0f 48 ca e9 28 ff ff ff
> > > RSP: 0018:c92b0d08 EFLAGS: 00010297
> > > RAX:  RBX: 888102398c80 RCX: 80190009
> > > RDX:  RSI: 006a RDI: 
> > > RBP: 888102398c00 R08:  R09: 
> > > R10: 00ca R11: 00015681 R12: 0001
> > > R13: c92b0d68 R14: 8885e000 R15: 8881107aca40
> > > FS:  7f41ded69500() GS:888667dc()
> > > knlGS:
> > > CS:  0010 DS:  ES:  CR0: 80050033
> > > CR2: 556ccc2dc1a0 CR3: 000104fd8003 CR4: 00772ef0
> > > PKRU: 5554
> > > Call Trace:
> > >  
> > >  ? die+0x32/0x80
> > >  ? do_trap+0xd9/0x100
> > >  ? dql_completed+0x26b/0x290
> 

[PATCH net v4 0/6] virtio_net: correct netdev_tx_reset_queue() invocation points

2024-12-05 Thread Koichiro Den
rtnetlink_rcv_msg+0x34f/0x3f0
 ? do_syscall_64+0x6c/0x180
 ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
 ? __pfx_rtnetlink_rcv_msg+0x10/0x10
 netlink_rcv_skb+0x54/0x100
 netlink_unicast+0x23e/0x390
 netlink_sendmsg+0x21e/0x490
 sys_sendmsg+0x31b/0x350
 ? copy_msghdr_from_user+0x6d/0xa0
 ___sys_sendmsg+0x86/0xd0
 ? __pte_offset_map+0x17/0x160
 ? preempt_count_add+0x69/0xa0
 ? __call_rcu_common.constprop.0+0x147/0x610
 ? preempt_count_add+0x69/0xa0
 ? preempt_count_add+0x69/0xa0
 ? _raw_spin_trylock+0x13/0x60
 ? trace_hardirqs_on+0x1d/0x80
 __sys_sendmsg+0x66/0xc0
 do_syscall_64+0x6c/0x180
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f41defe5b34
Code: 15 e1 12 0f 00 f7 d8 64 89 02 b8 ff ff ff ff eb bf 0f 1f 44 00 00
f3 0f 1e fa 80 3d 35 95 0f 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00
f0 ff ff 77 4c c3 0f 1f 00 55 48 89 e5 48 83 ec 20 89 55
RSP: 002b:7ffe5336ecc8 EFLAGS: 0202 ORIG_RAX: 002e
RAX: ffda RBX: 0003 RCX: 7f41defe5b34
RDX:  RSI: 7ffe5336ed30 RDI: 0003
RBP: 7ffe5336eda0 R08: 0010 R09: 0001
R10: 7ffe5336f6f9 R11: 0202 R12: 0003
R13: 67452259 R14: 556ccc28b040 R15: 
 
[...]
---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

Koichiro Den (6):
  virtio_net: correct netdev_tx_reset_queue() invocation point
  virtio_net: replace vq2rxq with vq2txq where appropriate
  virtio_ring: add a func argument 'recycle_done' to virtqueue_resize()
  virtio_net: ensure netdev_tx_reset_queue is called on tx ring resize
  virtio_ring: add a func argument 'recycle_done' to virtqueue_reset()
  virtio_net: ensure netdev_tx_reset_queue is called on bind xsk for tx

 drivers/net/virtio_net.c | 31 +--
 drivers/virtio/virtio_ring.c | 12 ++--
 include/linux/virtio.h   |  6 --
 3 files changed, 39 insertions(+), 10 deletions(-)

-- 
2.43.0




[PATCH net v4 1/6] virtio_net: correct netdev_tx_reset_queue() invocation point

2024-12-05 Thread Koichiro Den
1 12 0f 00 f7 d8 64 89 02 b8 ff ff ff ff eb bf 0f 1f 44 00 00
f3 0f 1e fa 80 3d 35 95 0f 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00
f0 ff ff 77 4c c3 0f 1f 00 55 48 89 e5 48 83 ec 20 89 55
RSP: 002b:7ffe5336ecc8 EFLAGS: 0202 ORIG_RAX: 002e
RAX: ffda RBX: 0003 RCX: 7f41defe5b34
RDX:  RSI: 7ffe5336ed30 RDI: 0003
RBP: 7ffe5336eda0 R08: 0010 R09: 0001
R10: 7ffe5336f6f9 R11: 0202 R12: 0003
R13: 67452259 R14: 556ccc28b040 R15: 
 
[...]
---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

Fixes: c8bd1f7f3e61 ("virtio_net: add support for Byte Queue Limits")
Cc:  # v6.11+
Signed-off-by: Koichiro Den 
---
 drivers/net/virtio_net.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 64c87bb48a41..6e0925f7f182 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3054,7 +3054,6 @@ static int virtnet_enable_queue_pair(struct virtnet_info 
*vi, int qp_index)
if (err < 0)
goto err_xdp_reg_mem_model;
 
-   netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, qp_index));
virtnet_napi_enable(vi->rq[qp_index].vq, &vi->rq[qp_index].napi);
virtnet_napi_tx_enable(vi, vi->sq[qp_index].vq, &vi->sq[qp_index].napi);
 
@@ -6966,11 +6965,20 @@ static int virtnet_probe(struct virtio_device *vdev)
 
 static void remove_vq_common(struct virtnet_info *vi)
 {
+   int i;
+
virtio_reset_device(vi->vdev);
 
/* Free unused buffers in both send and recv, if any. */
free_unused_bufs(vi);
 
+   /*
+* Rule of thumb is netdev_tx_reset_queue() should follow any
+* skb freeing not followed by netdev_tx_completed_queue()
+*/
+   for (i = 0; i < vi->max_queue_pairs; i++)
+   netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, i));
+
free_receive_bufs(vi);
 
free_receive_page_frags(vi);
-- 
2.43.0




[PATCH net v4 3/6] virtio_ring: add a func argument 'recycle_done' to virtqueue_resize()

2024-12-05 Thread Koichiro Den
When virtqueue_resize() has actually recycled all unused buffers,
additional work may be required in some cases. Relying solely on its
return status is fragile, so introduce a new function argument
'recycle_done', which is invoked when the recycle really occurs.

Cc:  # v6.11+
Signed-off-by: Koichiro Den 
---
 drivers/net/virtio_net.c | 4 ++--
 drivers/virtio/virtio_ring.c | 6 +-
 include/linux/virtio.h   | 3 ++-
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index fc89c5e1a207..e10bc9e6b072 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3331,7 +3331,7 @@ static int virtnet_rx_resize(struct virtnet_info *vi,
 
virtnet_rx_pause(vi, rq);
 
-   err = virtqueue_resize(rq->vq, ring_num, virtnet_rq_unmap_free_buf);
+   err = virtqueue_resize(rq->vq, ring_num, virtnet_rq_unmap_free_buf, 
NULL);
if (err)
netdev_err(vi->dev, "resize rx fail: rx queue index: %d err: 
%d\n", qindex, err);
 
@@ -3394,7 +3394,7 @@ static int virtnet_tx_resize(struct virtnet_info *vi, 
struct send_queue *sq,
 
virtnet_tx_pause(vi, sq);
 
-   err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf);
+   err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf, 
NULL);
if (err)
netdev_err(vi->dev, "resize tx fail: tx queue index: %d err: 
%d\n", qindex, err);
 
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 82a7d2cbc704..6af8cf6a619e 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2772,6 +2772,7 @@ EXPORT_SYMBOL_GPL(vring_create_virtqueue_dma);
  * @_vq: the struct virtqueue we're talking about.
  * @num: new ring num
  * @recycle: callback to recycle unused buffers
+ * @recycle_done: callback to be invoked when recycle for all unused buffers 
done
  *
  * When it is really necessary to create a new vring, it will set the current 
vq
  * into the reset state. Then call the passed callback to recycle the buffer
@@ -2792,7 +2793,8 @@ EXPORT_SYMBOL_GPL(vring_create_virtqueue_dma);
  *
  */
 int virtqueue_resize(struct virtqueue *_vq, u32 num,
-void (*recycle)(struct virtqueue *vq, void *buf))
+void (*recycle)(struct virtqueue *vq, void *buf),
+void (*recycle_done)(struct virtqueue *vq))
 {
struct vring_virtqueue *vq = to_vvq(_vq);
int err;
@@ -2809,6 +2811,8 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
err = virtqueue_disable_and_recycle(_vq, recycle);
if (err)
return err;
+   if (recycle_done)
+   recycle_done(_vq);
 
if (vq->packed_ring)
err = virtqueue_resize_packed(_vq, num);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 57cc4b07fd17..0aa7df4ed5ca 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -109,7 +109,8 @@ dma_addr_t virtqueue_get_avail_addr(const struct virtqueue 
*vq);
 dma_addr_t virtqueue_get_used_addr(const struct virtqueue *vq);
 
 int virtqueue_resize(struct virtqueue *vq, u32 num,
-void (*recycle)(struct virtqueue *vq, void *buf));
+void (*recycle)(struct virtqueue *vq, void *buf),
+void (*recycle_done)(struct virtqueue *vq));
 int virtqueue_reset(struct virtqueue *vq,
void (*recycle)(struct virtqueue *vq, void *buf));
 
-- 
2.43.0




[PATCH net v4 5/6] virtio_ring: add a func argument 'recycle_done' to virtqueue_reset()

2024-12-05 Thread Koichiro Den
When virtqueue_reset() has actually recycled all unused buffers,
additional work may be required in some cases. Relying solely on its
return status is fragile, so introduce a new function argument
'recycle_done', which is invoked when it really occurs.

Signed-off-by: Koichiro Den 
---
 drivers/net/virtio_net.c | 4 ++--
 drivers/virtio/virtio_ring.c | 6 +-
 include/linux/virtio.h   | 3 ++-
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 3a0341cc6085..5cf4b2b20431 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -5711,7 +5711,7 @@ static int virtnet_rq_bind_xsk_pool(struct virtnet_info 
*vi, struct receive_queu
 
virtnet_rx_pause(vi, rq);
 
-   err = virtqueue_reset(rq->vq, virtnet_rq_unmap_free_buf);
+   err = virtqueue_reset(rq->vq, virtnet_rq_unmap_free_buf, NULL);
if (err) {
netdev_err(vi->dev, "reset rx fail: rx queue index: %d err: 
%d\n", qindex, err);
 
@@ -5740,7 +5740,7 @@ static int virtnet_sq_bind_xsk_pool(struct virtnet_info 
*vi,
 
virtnet_tx_pause(vi, sq);
 
-   err = virtqueue_reset(sq->vq, virtnet_sq_free_unused_buf);
+   err = virtqueue_reset(sq->vq, virtnet_sq_free_unused_buf, NULL);
if (err) {
netdev_err(vi->dev, "reset tx fail: tx queue index: %d err: 
%d\n", qindex, err);
pool = NULL;
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 6af8cf6a619e..fdd2d2b07b5a 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2827,6 +2827,7 @@ EXPORT_SYMBOL_GPL(virtqueue_resize);
  * virtqueue_reset - detach and recycle all unused buffers
  * @_vq: the struct virtqueue we're talking about.
  * @recycle: callback to recycle unused buffers
+ * @recycle_done: callback to be invoked when recycle for all unused buffers 
done
  *
  * Caller must ensure we don't call this with other virtqueue operations
  * at the same time (except where noted).
@@ -2838,7 +2839,8 @@ EXPORT_SYMBOL_GPL(virtqueue_resize);
  * -EPERM: Operation not permitted
  */
 int virtqueue_reset(struct virtqueue *_vq,
-   void (*recycle)(struct virtqueue *vq, void *buf))
+   void (*recycle)(struct virtqueue *vq, void *buf),
+   void (*recycle_done)(struct virtqueue *vq))
 {
struct vring_virtqueue *vq = to_vvq(_vq);
int err;
@@ -2846,6 +2848,8 @@ int virtqueue_reset(struct virtqueue *_vq,
err = virtqueue_disable_and_recycle(_vq, recycle);
if (err)
return err;
+   if (recycle_done)
+   recycle_done(_vq);
 
if (vq->packed_ring)
virtqueue_reinit_packed(vq);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 0aa7df4ed5ca..dd88682e27e3 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -112,7 +112,8 @@ int virtqueue_resize(struct virtqueue *vq, u32 num,
 void (*recycle)(struct virtqueue *vq, void *buf),
 void (*recycle_done)(struct virtqueue *vq));
 int virtqueue_reset(struct virtqueue *vq,
-   void (*recycle)(struct virtqueue *vq, void *buf));
+   void (*recycle)(struct virtqueue *vq, void *buf),
+   void (*recycle_done)(struct virtqueue *vq));
 
 struct virtio_admin_cmd {
__le16 opcode;
-- 
2.43.0




[PATCH net v4 4/6] virtio_net: ensure netdev_tx_reset_queue is called on tx ring resize

2024-12-05 Thread Koichiro Den
virtnet_tx_resize() flushes remaining tx skbs, requiring DQL counters to
be reset when flushing has actually occurred. Add
virtnet_sq_free_unused_buf_done() as a callback for virtqueue_reset() to
handle this.

Fixes: c8bd1f7f3e61 ("virtio_net: add support for Byte Queue Limits")
Cc:  # v6.11+
Signed-off-by: Koichiro Den 
---
 drivers/net/virtio_net.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index e10bc9e6b072..3a0341cc6085 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -503,6 +503,7 @@ struct virtio_net_common_hdr {
 static struct virtio_net_common_hdr xsk_hdr;
 
 static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf);
+static void virtnet_sq_free_unused_buf_done(struct virtqueue *vq);
 static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct xdp_buff *xdp,
   struct net_device *dev,
   unsigned int *xdp_xmit,
@@ -3394,7 +3395,8 @@ static int virtnet_tx_resize(struct virtnet_info *vi, 
struct send_queue *sq,
 
virtnet_tx_pause(vi, sq);
 
-   err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf, 
NULL);
+   err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf,
+  virtnet_sq_free_unused_buf_done);
if (err)
netdev_err(vi->dev, "resize tx fail: tx queue index: %d err: 
%d\n", qindex, err);
 
@@ -6233,6 +6235,14 @@ static void virtnet_sq_free_unused_buf(struct virtqueue 
*vq, void *buf)
}
 }
 
+static void virtnet_sq_free_unused_buf_done(struct virtqueue *vq)
+{
+   struct virtnet_info *vi = vq->vdev->priv;
+   int i = vq2txq(vq);
+
+   netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, i));
+}
+
 static void free_unused_bufs(struct virtnet_info *vi)
 {
void *buf;
-- 
2.43.0




[PATCH net v4 6/6] virtio_net: ensure netdev_tx_reset_queue is called on bind xsk for tx

2024-12-05 Thread Koichiro Den
virtnet_sq_bind_xsk_pool() flushes tx skbs and then resets tx queue, so
DQL counters need to be reset when flushing has actually occurred, Add
virtnet_sq_free_unused_buf_done() as a callback for virtqueue_resize()
to handle this.

Fixes: 21a4e3ce6dc7 ("virtio_net: xsk: bind/unbind xsk for tx")
Signed-off-by: Koichiro Den 
---
 drivers/net/virtio_net.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 5cf4b2b20431..7646ddd9bef7 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -5740,7 +5740,8 @@ static int virtnet_sq_bind_xsk_pool(struct virtnet_info 
*vi,
 
virtnet_tx_pause(vi, sq);
 
-   err = virtqueue_reset(sq->vq, virtnet_sq_free_unused_buf, NULL);
+   err = virtqueue_reset(sq->vq, virtnet_sq_free_unused_buf,
+ virtnet_sq_free_unused_buf_done);
if (err) {
netdev_err(vi->dev, "reset tx fail: tx queue index: %d err: 
%d\n", qindex, err);
pool = NULL;
-- 
2.43.0




[PATCH net v4 2/6] virtio_net: replace vq2rxq with vq2txq where appropriate

2024-12-05 Thread Koichiro Den
While not harmful, using vq2rxq where it's always sq appears odd.
Replace it with the more appropriate vq2txq for clarity and correctness.

Fixes: 89f86675cb03 ("virtio_net: xsk: tx: support xmit xsk buffer")
Signed-off-by: Koichiro Den 
---
 drivers/net/virtio_net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 6e0925f7f182..fc89c5e1a207 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -6213,7 +6213,7 @@ static void virtnet_sq_free_unused_buf(struct virtqueue 
*vq, void *buf)
 {
struct virtnet_info *vi = vq->vdev->priv;
struct send_queue *sq;
-   int i = vq2rxq(vq);
+   int i = vq2txq(vq);
 
sq = &vi->sq[i];
 
-- 
2.43.0




Re: [PATCH net-next v3 1/7] virtio_net: correct netdev_tx_reset_queue() invocation point

2024-12-05 Thread Koichiro Den
On Thu, Dec 05, 2024 at 10:17:59AM -0500, Michael S. Tsirkin wrote:
> On Thu, Dec 05, 2024 at 10:16:35PM +0900, Koichiro Den wrote:
> > On Thu, Dec 05, 2024 at 09:43:38PM +0900, Koichiro Den wrote:
> > > On Thu, Dec 05, 2024 at 05:33:36AM -0500, Michael S. Tsirkin wrote:
> > > > On Wed, Dec 04, 2024 at 02:07:18PM +0900, Koichiro Den wrote:
> > > > > When virtnet_close is followed by virtnet_open, some TX completions 
> > > > > can
> > > > > possibly remain unconsumed, until they are finally processed during 
> > > > > the
> > > > > first NAPI poll after the netdev_tx_reset_queue(), resulting in a 
> > > > > crash
> > > > > [1].
> > > > 
> > > > 
> > > > So it's a bugfix. Why net-next not net?
> > > 
> > > I was mistaken (I just read netdev-FAQ again). I'll resend to net, with
> > > adjustments reflecting your feedback.
> > > 
> > > > 
> > > > > Commit b96ed2c97c79 ("virtio_net: move netdev_tx_reset_queue() call
> > > > > before RX napi enable") was not sufficient to eliminate all BQL crash
> > > > > cases for virtio-net.
> > > > > 
> > > > > This issue can be reproduced with the latest net-next master by 
> > > > > running:
> > > > > `while :; do ip l set DEV down; ip l set DEV up; done` under heavy 
> > > > > network
> > > > > TX load from inside the machine.
> > > > > 
> > > > > netdev_tx_reset_queue() can actually be dropped from virtnet_open 
> > > > > path;
> > > > > the device is not stopped in any case. For BQL core part, it's just 
> > > > > like
> > > > > traffic nearly ceases to exist for some period. For stall detector 
> > > > > added
> > > > > to BQL, even if virtnet_close could somehow lead to some TX 
> > > > > completions
> > > > > delayed for long, followed by virtnet_open, we can just take it as 
> > > > > stall
> > > > > as mentioned in commit 6025b9135f7a ("net: dqs: add NIC stall detector
> > > > > based on BQL"). Note also that users can still reset stall_max via 
> > > > > sysfs.
> > > > > 
> > > > > So, drop netdev_tx_reset_queue() from virtnet_enable_queue_pair(). 
> > > > > This
> > > > > eliminates the BQL crashes. Note that netdev_tx_reset_queue() is now
> > > > > explicitly required in freeze/restore path, so this patch adds it to
> > > > > free_unused_bufs().
> > > > 
> > > > I don't much like that free_unused_bufs now has this side effect.
> > > > I think would be better to just add a loop in virtnet_restore.
> > > > Or if you want to keep it there, pls rename the function
> > > > to hint it does more.
> > > 
> > > It makes sense. I would go for the former. Thanks.
> > 
> > Hmm, as Jacob pointed out in v1
> > (https://lore.kernel.org/all/20241202181445.0da50...@kernel.org/),
> > it looks better to follow the rule of thumb.
> 
> OK then. I'm fine with keeping your code as is, just a squash,
> and add a comment
> 
>   /*
>* Rule of thumb is netdev_tx_reset_queue() should follow any
>* skb freeing not followed by netdev_tx_completed_queue()
>*/

Ok, thanks for the review!

> 
> > Taking both suggestions
> > from Jacob and you, adding a loop to remove_vq_common(), just after
> > free_unused_bufs(), seems more fitting now, like this:
> > 
> >  static void remove_vq_common(struct virtnet_info *vi)
> >  {
> > +   int i;
> > +
> > virtio_reset_device(vi->vdev);
> > 
> > /* Free unused buffers in both send and recv, if any. */
> > free_unused_bufs(vi);
> > 
> > +   /* Tx unused buffers flushed, so reset BQL counter */
> > +   for (i = 0; i < vi->max_queue_pairs; i++)
> > +   netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, i));
> > +
> > free_receive_bufs(vi);
> > 
> > What do you think?
> > 
> > Thanks,
> > 
> > -Koichiro Den
> > 
> > > 
> > > > 
> > > > 
> > > > > 
> > > > > [1]:
> > > > > [ cut here ]
> > > > > kernel BUG at lib/dynamic_queue_limi

Re: [PATCH] selftests: gpio: gpio-sim: Fix missing chip disablements

2025-01-23 Thread Koichiro Den
On Wed, Jan 22, 2025 at 10:26:27AM GMT, Bartosz Golaszewski wrote:
> On Wed, Jan 22, 2025 at 5:33 AM Koichiro Den  
> wrote:
> >
> > Since upstream commit 8bd76b3d3f3a ("gpio: sim: lock up configfs that an
> > instantiated device depends on"), rmdir for an active virtual devices
> > been prohibited.
> >
> > Update gpio-sim selftest to align with the change.
> >
> > Reported-by: kernel test robot 
> > Closes: https://lore.kernel.org/oe-lkp/202501221006.a1ca5dfa-...@intel.com
> > Signed-off-by: Koichiro Den 
> > ---
> >  tools/testing/selftests/gpio/gpio-sim.sh | 31 +++-
> >  1 file changed, 25 insertions(+), 6 deletions(-)
> >
> > diff --git a/tools/testing/selftests/gpio/gpio-sim.sh 
> > b/tools/testing/selftests/gpio/gpio-sim.sh
> > index 6fb66a687f17..bbc29ed9c60a 100755
> > --- a/tools/testing/selftests/gpio/gpio-sim.sh
> > +++ b/tools/testing/selftests/gpio/gpio-sim.sh
> > @@ -46,12 +46,6 @@ remove_chip() {
> > rmdir $CONFIGFS_DIR/$CHIP || fail "Unable to remove the chip"
> >  }
> >
> > -configfs_cleanup() {
> > -   for CHIP in `ls $CONFIGFS_DIR/`; do
> > -   remove_chip $CHIP
> > -   done
> > -}
> > -
> >  create_chip() {
> > local CHIP=$1
> >
> > @@ -105,6 +99,13 @@ disable_chip() {
> > echo 0 > $CONFIGFS_DIR/$CHIP/live || fail "Unable to disable the 
> > chip"
> >  }
> >
> > +configfs_cleanup() {
> > +   for CHIP in `ls $CONFIGFS_DIR/`; do
> > +   disable_chip $CHIP
> > +   remove_chip $CHIP
> > +   done
> > +}
> > +
> >  configfs_chip_name() {
> > local CHIP=$1
> > local BANK=$2
> > @@ -181,6 +182,7 @@ create_chip chip
> >  create_bank chip bank
> >  enable_chip chip
> >  test -n `cat $CONFIGFS_DIR/chip/bank/chip_name` || fail "chip_name doesn't 
> > work"
> > +disable_chip chip
> >  remove_chip chip
> >
> 
> Hi! Thanks for addressing it.
> 
> Is there any place in this file where we'd call remove_chip() without
> calling disable_chip() first? Maybe we can fold disable_chip() into
> remove_chip() and make the patch much smaller?

My aplogies for being late.

Yes, there are five places where I intentionally omitted disable_chip()
calls before remove_chip() because the chip wasn't enabled in thoses cases.
I scattered disable_chip() calls only where truly necessary. I also think
explicit enable_chip()/disable_chip() pairing look more clean and readable.

That being said, I'm fine with your suggestion.

-Koichiro Den

> 
> Bart



[PATCH] selftests: gpio: gpio-sim: Fix missing chip disablements

2025-01-21 Thread Koichiro Den
Since upstream commit 8bd76b3d3f3a ("gpio: sim: lock up configfs that an
instantiated device depends on"), rmdir for an active virtual devices
been prohibited.

Update gpio-sim selftest to align with the change.

Reported-by: kernel test robot 
Closes: https://lore.kernel.org/oe-lkp/202501221006.a1ca5dfa-...@intel.com
Signed-off-by: Koichiro Den 
---
 tools/testing/selftests/gpio/gpio-sim.sh | 31 +++-
 1 file changed, 25 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/gpio/gpio-sim.sh 
b/tools/testing/selftests/gpio/gpio-sim.sh
index 6fb66a687f17..bbc29ed9c60a 100755
--- a/tools/testing/selftests/gpio/gpio-sim.sh
+++ b/tools/testing/selftests/gpio/gpio-sim.sh
@@ -46,12 +46,6 @@ remove_chip() {
rmdir $CONFIGFS_DIR/$CHIP || fail "Unable to remove the chip"
 }
 
-configfs_cleanup() {
-   for CHIP in `ls $CONFIGFS_DIR/`; do
-   remove_chip $CHIP
-   done
-}
-
 create_chip() {
local CHIP=$1
 
@@ -105,6 +99,13 @@ disable_chip() {
echo 0 > $CONFIGFS_DIR/$CHIP/live || fail "Unable to disable the chip"
 }
 
+configfs_cleanup() {
+   for CHIP in `ls $CONFIGFS_DIR/`; do
+   disable_chip $CHIP
+   remove_chip $CHIP
+   done
+}
+
 configfs_chip_name() {
local CHIP=$1
local BANK=$2
@@ -181,6 +182,7 @@ create_chip chip
 create_bank chip bank
 enable_chip chip
 test -n `cat $CONFIGFS_DIR/chip/bank/chip_name` || fail "chip_name doesn't 
work"
+disable_chip chip
 remove_chip chip
 
 echo "1.2. chip_name returns 'none' if the chip is still pending"
@@ -195,6 +197,7 @@ create_chip chip
 create_bank chip bank
 enable_chip chip
 test -n `cat $CONFIGFS_DIR/chip/dev_name` || fail "dev_name doesn't work"
+disable_chip chip
 remove_chip chip
 
 echo "2. Creating and configuring simulated chips"
@@ -204,6 +207,7 @@ create_chip chip
 create_bank chip bank
 enable_chip chip
 test "`get_chip_num_lines chip bank`" = "1" || fail "default number of lines 
is not 1"
+disable_chip chip
 remove_chip chip
 
 echo "2.2. Number of lines can be specified"
@@ -212,6 +216,7 @@ create_bank chip bank
 set_num_lines chip bank 16
 enable_chip chip
 test "`get_chip_num_lines chip bank`" = "16" || fail "number of lines is not 
16"
+disable_chip chip
 remove_chip chip
 
 echo "2.3. Label can be set"
@@ -220,6 +225,7 @@ create_bank chip bank
 set_label chip bank foobar
 enable_chip chip
 test "`get_chip_label chip bank`" = "foobar" || fail "label is incorrect"
+disable_chip chip
 remove_chip chip
 
 echo "2.4. Label can be left empty"
@@ -227,6 +233,7 @@ create_chip chip
 create_bank chip bank
 enable_chip chip
 test -z "`cat $CONFIGFS_DIR/chip/bank/label`" || fail "label is not empty"
+disable_chip chip
 remove_chip chip
 
 echo "2.5. Line names can be configured"
@@ -238,6 +245,7 @@ set_line_name chip bank 2 bar
 enable_chip chip
 test "`get_line_name chip bank 0`" = "foo" || fail "line name is incorrect"
 test "`get_line_name chip bank 2`" = "bar" || fail "line name is incorrect"
+disable_chip chip
 remove_chip chip
 
 echo "2.6. Line config can remain unused if offset is greater than number of 
lines"
@@ -248,6 +256,7 @@ set_line_name chip bank 5 foobar
 enable_chip chip
 test "`get_line_name chip bank 0`" = "" || fail "line name is incorrect"
 test "`get_line_name chip bank 1`" = "" || fail "line name is incorrect"
+disable_chip chip
 remove_chip chip
 
 echo "2.7. Line configfs directory names are sanitized"
@@ -267,6 +276,7 @@ for CHIP in $CHIPS; do
enable_chip $CHIP
 done
 for CHIP in $CHIPS; do
+  disable_chip $CHIP
remove_chip $CHIP
 done
 
@@ -278,6 +288,7 @@ echo foobar > $CONFIGFS_DIR/chip/bank/label 2> /dev/null && 
\
fail "Setting label of a live chip should fail"
 echo 8 > $CONFIGFS_DIR/chip/bank/num_lines 2> /dev/null && \
fail "Setting number of lines of a live chip should fail"
+disable_chip chip
 remove_chip chip
 
 echo "2.10. Can't create line items when chip is live"
@@ -285,6 +296,7 @@ create_chip chip
 create_bank chip bank
 enable_chip chip
 mkdir $CONFIGFS_DIR/chip/bank/line0 2> /dev/null && fail "Creating line item 
should fail"
+disable_chip chip
 remove_chip chip
 
 echo "2.11. Probe errors are propagated to user-space"
@@ -316,6 +328,7 @@ mkdir -p $CONFIGFS_DIR/chip/bank/line4/hog
 enable_chip chip
 $BASE_DIR/gpio-mockup-cdev -s 1 /dev/`configfs_chip_name chip bank` 4 2> 
/dev/null && \
fail "Setting the value of a hogged line shouldn't succeed"
+disable_chip ch