On Wed, Jan 07, 2026 at 07:55:26PM +0100, Francesco Valla wrote:
> Hi Matias,
>
> On Wed, Jan 07, 2026 at 05:14:25PM +0100, Matias Ezequiel Vara Larsen wrote:
> > On Fri, Dec 26, 2025 at 4:09 PM Francesco Valla <[email protected]> wrote:
> > >
> > > Hi,
> > >
> > > On Sun, Dec 14, 2025 at 04:25:54PM +0100, Francesco Valla wrote:
> > > > While stress testing this, I noticed that flooding the virtio-can
> > > > interface with packets leads to an hang of the interface itself.
> > > > I am seeing this issuing, at host side:
> > > >
> > > > while true; do cansend can0 123#00; done
> > > >
> > > > with:
> > > >
> > > > - QEMU: the tip of the master branch plus [2]
> > > > - vhost-device: the tip of the main branch
> > > >
> > > > and the following QEMU invocation:
> > > >
> > > > qemu-system-x86_64 -serial mon:stdio \
> > > > -m 2G -smp 2 \
> > > > -kernel $(pwd)/BUILD.bin/arch/x86/boot/bzImage \
> > > > -initrd /home/francesco/SRC/LINUX_KERNEL/initramfs.gz \
> > > > -append "loglevel=7 console=ttyS0" \
> > > > -machine memory-backend=pc.ram \
> > > > -object
> > > > memory-backend-file,id=pc.ram,size=2G,mem-path=/tmp/pc.ram,share=on \
> > > > -chardev socket,id=can0,path=/tmp/sock-can0 \
> > > > -device vhost-user-can-pci,chardev=can0
> > > >
> > > >
> > > > Restarting the interface (i.e.: ip link set down and the up) does not
> > > > fix the situation.
> > > >
> > > > I'll try to do some more testing during the next days.
> > >
> > > After a deep dive, I _think_ the problem actually lies in vhost-device,
> > > since it is not there (or al least, it seems so) using an alternative
> > > implementation that uses the qemu socketcan support [0] (implementation
> > > which builds on top of the work done by Harald and Mikhail):
> > >
> > > qemu-system-x86_64 -serial mon:stdio \
> > > -m 2G -smp 2 -enable-kvm \
> > > -kernel $(pwd)/BUILD.bin/arch/x86/boot/bzImage \
> > > -initrd /home/francesco/SRC/LINUX_KERNEL/initramfs.gz \
> > > -append "loglevel=7 console=ttyS0" \
> > > -object can-bus,id=canbus0 -object
> > > can-host-socketcan,id=canhost0,if=vcan0,canbus=canbus0 \
> > > -device virtio-can-pci,canbus=canbus0
> > >
> > > Unfortunately, my Rust knoweledge is not sufficient to understand the
> > > vhost-device implementation [1]; the issue seems to be related to the
> > > host->guest vring becoming empty and not refilling anymore.
> > >
> >
> > Can you try with
> > https://github.com/MatiasVara/vhost-device/commits/fix-for-923/?
>
> I'll stress test it during the night, but this seems to fix it. Before
> it was reproducible in a consistent manner after mere seconds, while i
> now in a bunch of runs I never reproduced it.
>
> I also agree with your analysis on the commit.
>
Quick update, unfortunately not good: the RX issue is solved, but now I
have a different one: if I send a single message either from the guset
or the host (e.g.: cansend can0 111#00) I get:
[ 16.496923] irq 11: nobody cared (try booting with the "irqpoll" option)
[ 16.511875] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted
6.18.0-00002-gded0a4b9da5a #29 PREEMPT(voluntary)
[ 16.511883] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
[ 16.511887] Call Trace:
[ 16.511933] <IRQ>
[ 16.511938] dump_stack_lvl+0x4d/0x70
[ 16.511973] __report_bad_irq+0x30/0xb7
[ 16.511986] note_interrupt.cold+0x28/0x66
[ 16.511988] handle_irq_event+0x6d/0x70
[ 16.512004] handle_fasteoi_irq+0xd5/0x1f0
[ 16.512011] __common_interrupt+0x3f/0xd0
[ 16.512023] ? tick_nohz_irq_exit+0x2e/0x60
[ 16.512035] common_interrupt+0x3b/0x90
[ 16.512057] asm_common_interrupt+0x26/0x40
[ 16.512073] RIP: 0010:handle_softirqs+0x6d/0x270
[ 16.512081] Code: 02 00 01 00 00 89 5c 24 14 48 89 6c 24 08 c7 44 24 10 0a
00 00 00 89 7c 24 04 31 c0 65 66 89 05 01 ce 3e 02 fb bb ff ff ff ff <49> c7 c2
c0 80 a0 a2 44 89 ed 41 0f bc dd 83 c3 01 74 76 8d 43 ff
[ 16.512082] RSP: 0018:ffffb22480003f98 EFLAGS: 00000246
[ 16.512086] RAX: 0000000000000000 RBX: 00000000ffffffff RCX: 0000000000000838
[ 16.512087] RDX: 0000000000000000 RSI: ffffffffa2a0e940 RDI: 0000000000000000
[ 16.512088] RBP: 00000000fffbac3e R08: 0000000000000001 R09: 0000000000000000
[ 16.512088] R10: ffffa2d57da249d0 R11: ffffb22480003ff8 R12: 0000000000000000
[ 16.512091] R13: 0000000000000082 R14: 0000000000000000 R15: 0000000000000000
[ 16.512097] irq_exit_rcu+0x89/0xb0
[ 16.512099] sysvec_apic_timer_interrupt+0x6b/0x80
[ 16.512103] </IRQ>
[ 16.512104] <TASK>
[ 16.512104] asm_sysvec_apic_timer_interrupt+0x1a/0x20
[ 16.512105] RIP: 0010:pv_native_safe_halt+0xf/0x20
[ 16.512107] Code: 2c 81 00 c3 cc cc cc cc 0f 1f 00 90 90 90 90 90 90 90 90
90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d 05 cf 18 00 fb f4 <c3> cc cc
cc cc 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90
[ 16.512109] RSP: 0018:ffffffffa2a03e80 EFLAGS: 00000212
[ 16.512110] RAX: ffffa2d5da523000 RBX: ffffffffa2a0e940 RCX: 0000000000000838
[ 16.512111] RDX: 4000000000000000 RSI: 0000000000000087 RDI: 00000000000a722c
[ 16.512111] RBP: 0000000000000000 R08: 00000000000a722c R09: ffffa2d57da249d0
[ 16.512112] R10: ffffa2d57da1bac0 R11: 0000000000000001 R12: 0000000000000000
[ 16.512112] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000014770
[ 16.512113] default_idle+0x9/0x10
[ 16.512117] default_idle_call+0x2a/0xf0
[ 16.512119] do_idle+0x1cb/0x230
[ 16.512129] cpu_startup_entry+0x24/0x30
[ 16.512130] rest_init+0xbc/0xc0
[ 16.512133] start_kernel+0x6d7/0x6e0
[ 16.512164] x86_64_start_reservations+0x24/0x30
[ 16.512172] x86_64_start_kernel+0xc8/0xd0
[ 16.512173] common_startup_64+0x13e/0x148
[ 16.512181] </TASK>
[ 16.512181] handlers:
[ 16.513166] [<00000000b61218c7>] vp_interrupt
[ 16.515096] Disabling IRQ #11
with IRQ#11 being:
# cat /proc/interrupts
CPU0 CPU1
11: 102218 0 IO-APIC 11-fasteoi virtio0
This cannot be reproduced with the old version of vhost-device. I think
it is due to the removal of the req_rx_buf variable and associated
logic: a vq kick is now being performed at every cycle of the event
loop, even if no processing happened. At guest side, this results in a
IRQ not cared for.
Francesco