On Thu, Jan 8, 2026 at 12:01 AM Francesco Valla <[email protected]> wrote:
>
> On Wed, Jan 07, 2026 at 07:55:26PM +0100, Francesco Valla wrote:
> > Hi Matias,
> >
> > On Wed, Jan 07, 2026 at 05:14:25PM +0100, Matias Ezequiel Vara Larsen wrote:
> > > On Fri, Dec 26, 2025 at 4:09 PM Francesco Valla <[email protected]> 
> > > wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Sun, Dec 14, 2025 at 04:25:54PM +0100, Francesco Valla wrote:
> > > > > While stress testing this, I noticed that flooding the virtio-can
> > > > > interface with packets leads to an hang of the interface itself.
> > > > > I am seeing this issuing, at host side:
> > > > >
> > > > >       while true; do cansend can0 123#00; done
> > > > >
> > > > > with:
> > > > >
> > > > >  - QEMU: the tip of the master branch plus [2]
> > > > >  - vhost-device: the tip of the main branch
> > > > >
> > > > > and the following QEMU invocation:
> > > > >
> > > > > qemu-system-x86_64 -serial mon:stdio \
> > > > >     -m 2G -smp 2 \
> > > > >     -kernel $(pwd)/BUILD.bin/arch/x86/boot/bzImage \
> > > > >     -initrd /home/francesco/SRC/LINUX_KERNEL/initramfs.gz \
> > > > >     -append "loglevel=7 console=ttyS0" \
> > > > >     -machine memory-backend=pc.ram \
> > > > >     -object 
> > > > > memory-backend-file,id=pc.ram,size=2G,mem-path=/tmp/pc.ram,share=on \
> > > > >     -chardev socket,id=can0,path=/tmp/sock-can0 \
> > > > >     -device vhost-user-can-pci,chardev=can0
> > > > >
> > > > >
> > > > > Restarting the interface (i.e.: ip link set down and the up) does not
> > > > > fix the situation.
> > > > >
> > > > > I'll try to do some more testing during the next days.
> > > >
> > > > After a deep dive, I _think_ the problem actually lies in vhost-device,
> > > > since it is not there (or al least, it seems so) using an alternative
> > > > implementation that uses the qemu socketcan support [0] (implementation
> > > > which builds on top of the work done by Harald and Mikhail):
> > > >
> > > > qemu-system-x86_64 -serial mon:stdio \
> > > >     -m 2G -smp 2 -enable-kvm \
> > > >     -kernel $(pwd)/BUILD.bin/arch/x86/boot/bzImage \
> > > >     -initrd /home/francesco/SRC/LINUX_KERNEL/initramfs.gz \
> > > >     -append "loglevel=7 console=ttyS0" \
> > > >     -object can-bus,id=canbus0 -object 
> > > > can-host-socketcan,id=canhost0,if=vcan0,canbus=canbus0 \
> > > >     -device virtio-can-pci,canbus=canbus0
> > > >
> > > > Unfortunately, my Rust knoweledge is not sufficient to understand the
> > > > vhost-device implementation [1]; the issue seems to be related to the
> > > > host->guest vring becoming empty and not refilling anymore.
> > > >
> > >
> > > Can you try with
> > > https://github.com/MatiasVara/vhost-device/commits/fix-for-923/?
> >
> > I'll stress test it during the night, but this seems to fix it. Before
> > it was reproducible in a consistent manner after mere seconds, while i
> > now in a bunch of runs I never reproduced it.
> >
> > I also agree with your analysis on the commit.
> >
>
> Quick update, unfortunately not good: the RX issue is solved, but now I
> have a different one: if I send a single message either from the guset
> or the host (e.g.: cansend can0 111#00) I get:
>
> [   16.496923] irq 11: nobody cared (try booting with the "irqpoll" option)
> [   16.511875] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 
> 6.18.0-00002-gded0a4b9da5a #29 PREEMPT(voluntary)
> [   16.511883] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
> [   16.511887] Call Trace:
> [   16.511933]  <IRQ>
> [   16.511938]  dump_stack_lvl+0x4d/0x70
> [   16.511973]  __report_bad_irq+0x30/0xb7
> [   16.511986]  note_interrupt.cold+0x28/0x66
> [   16.511988]  handle_irq_event+0x6d/0x70
> [   16.512004]  handle_fasteoi_irq+0xd5/0x1f0
> [   16.512011]  __common_interrupt+0x3f/0xd0
> [   16.512023]  ? tick_nohz_irq_exit+0x2e/0x60
> [   16.512035]  common_interrupt+0x3b/0x90
> [   16.512057]  asm_common_interrupt+0x26/0x40
> [   16.512073] RIP: 0010:handle_softirqs+0x6d/0x270
> [   16.512081] Code: 02 00 01 00 00 89 5c 24 14 48 89 6c 24 08 c7 44 24 10 0a 
> 00 00 00 89 7c 24 04 31 c0 65 66 89 05 01 ce 3e 02 fb bb ff ff ff ff <49> c7 
> c2 c0 80 a0 a2 44 89 ed 41 0f bc dd 83 c3 01 74 76 8d 43 ff
> [   16.512082] RSP: 0018:ffffb22480003f98 EFLAGS: 00000246
> [   16.512086] RAX: 0000000000000000 RBX: 00000000ffffffff RCX: 
> 0000000000000838
> [   16.512087] RDX: 0000000000000000 RSI: ffffffffa2a0e940 RDI: 
> 0000000000000000
> [   16.512088] RBP: 00000000fffbac3e R08: 0000000000000001 R09: 
> 0000000000000000
> [   16.512088] R10: ffffa2d57da249d0 R11: ffffb22480003ff8 R12: 
> 0000000000000000
> [   16.512091] R13: 0000000000000082 R14: 0000000000000000 R15: 
> 0000000000000000
> [   16.512097]  irq_exit_rcu+0x89/0xb0
> [   16.512099]  sysvec_apic_timer_interrupt+0x6b/0x80
> [   16.512103]  </IRQ>
> [   16.512104]  <TASK>
> [   16.512104]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
> [   16.512105] RIP: 0010:pv_native_safe_halt+0xf/0x20
> [   16.512107] Code: 2c 81 00 c3 cc cc cc cc 0f 1f 00 90 90 90 90 90 90 90 90 
> 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d 05 cf 18 00 fb f4 <c3> cc 
> cc cc cc 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90
> [   16.512109] RSP: 0018:ffffffffa2a03e80 EFLAGS: 00000212
> [   16.512110] RAX: ffffa2d5da523000 RBX: ffffffffa2a0e940 RCX: 
> 0000000000000838
> [   16.512111] RDX: 4000000000000000 RSI: 0000000000000087 RDI: 
> 00000000000a722c
> [   16.512111] RBP: 0000000000000000 R08: 00000000000a722c R09: 
> ffffa2d57da249d0
> [   16.512112] R10: ffffa2d57da1bac0 R11: 0000000000000001 R12: 
> 0000000000000000
> [   16.512112] R13: 0000000000000000 R14: 0000000000000000 R15: 
> 0000000000014770
> [   16.512113]  default_idle+0x9/0x10
> [   16.512117]  default_idle_call+0x2a/0xf0
> [   16.512119]  do_idle+0x1cb/0x230
> [   16.512129]  cpu_startup_entry+0x24/0x30
> [   16.512130]  rest_init+0xbc/0xc0
> [   16.512133]  start_kernel+0x6d7/0x6e0
> [   16.512164]  x86_64_start_reservations+0x24/0x30
> [   16.512172]  x86_64_start_kernel+0xc8/0xd0
> [   16.512173]  common_startup_64+0x13e/0x148
> [   16.512181]  </TASK>
> [   16.512181] handlers:
> [   16.513166] [<00000000b61218c7>] vp_interrupt
> [   16.515096] Disabling IRQ #11
>
> with IRQ#11 being:
>
> # cat /proc/interrupts
>            CPU0       CPU1
>  11:     102218          0  IO-APIC  11-fasteoi   virtio0
>
>
> This cannot be reproduced with the old version of vhost-device. I think
> it is due to the removal of the req_rx_buf variable and associated
> logic: a vq kick is now being performed at every cycle of the event
> loop, even if no processing happened. At guest side, this results in a
> IRQ not cared for.
>

Thanks Francesco! right, I made a mistake. process_rx_queue() is only
invoking signal_used_queue() when req_rx_buf is true, i.e., the device
has actually added something to the used ring. I'll fix that.

Matias


Reply via email to