On Thu, Jan 8, 2026 at 12:01 AM Francesco Valla <[email protected]> wrote: > > On Wed, Jan 07, 2026 at 07:55:26PM +0100, Francesco Valla wrote: > > Hi Matias, > > > > On Wed, Jan 07, 2026 at 05:14:25PM +0100, Matias Ezequiel Vara Larsen wrote: > > > On Fri, Dec 26, 2025 at 4:09 PM Francesco Valla <[email protected]> > > > wrote: > > > > > > > > Hi, > > > > > > > > On Sun, Dec 14, 2025 at 04:25:54PM +0100, Francesco Valla wrote: > > > > > While stress testing this, I noticed that flooding the virtio-can > > > > > interface with packets leads to an hang of the interface itself. > > > > > I am seeing this issuing, at host side: > > > > > > > > > > while true; do cansend can0 123#00; done > > > > > > > > > > with: > > > > > > > > > > - QEMU: the tip of the master branch plus [2] > > > > > - vhost-device: the tip of the main branch > > > > > > > > > > and the following QEMU invocation: > > > > > > > > > > qemu-system-x86_64 -serial mon:stdio \ > > > > > -m 2G -smp 2 \ > > > > > -kernel $(pwd)/BUILD.bin/arch/x86/boot/bzImage \ > > > > > -initrd /home/francesco/SRC/LINUX_KERNEL/initramfs.gz \ > > > > > -append "loglevel=7 console=ttyS0" \ > > > > > -machine memory-backend=pc.ram \ > > > > > -object > > > > > memory-backend-file,id=pc.ram,size=2G,mem-path=/tmp/pc.ram,share=on \ > > > > > -chardev socket,id=can0,path=/tmp/sock-can0 \ > > > > > -device vhost-user-can-pci,chardev=can0 > > > > > > > > > > > > > > > Restarting the interface (i.e.: ip link set down and the up) does not > > > > > fix the situation. > > > > > > > > > > I'll try to do some more testing during the next days. > > > > > > > > After a deep dive, I _think_ the problem actually lies in vhost-device, > > > > since it is not there (or al least, it seems so) using an alternative > > > > implementation that uses the qemu socketcan support [0] (implementation > > > > which builds on top of the work done by Harald and Mikhail): > > > > > > > > qemu-system-x86_64 -serial mon:stdio \ > > > > -m 2G -smp 2 -enable-kvm \ > > > > -kernel $(pwd)/BUILD.bin/arch/x86/boot/bzImage \ > > > > -initrd /home/francesco/SRC/LINUX_KERNEL/initramfs.gz \ > > > > -append "loglevel=7 console=ttyS0" \ > > > > -object can-bus,id=canbus0 -object > > > > can-host-socketcan,id=canhost0,if=vcan0,canbus=canbus0 \ > > > > -device virtio-can-pci,canbus=canbus0 > > > > > > > > Unfortunately, my Rust knoweledge is not sufficient to understand the > > > > vhost-device implementation [1]; the issue seems to be related to the > > > > host->guest vring becoming empty and not refilling anymore. > > > > > > > > > > Can you try with > > > https://github.com/MatiasVara/vhost-device/commits/fix-for-923/? > > > > I'll stress test it during the night, but this seems to fix it. Before > > it was reproducible in a consistent manner after mere seconds, while i > > now in a bunch of runs I never reproduced it. > > > > I also agree with your analysis on the commit. > > > > Quick update, unfortunately not good: the RX issue is solved, but now I > have a different one: if I send a single message either from the guset > or the host (e.g.: cansend can0 111#00) I get: > > [ 16.496923] irq 11: nobody cared (try booting with the "irqpoll" option) > [ 16.511875] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted > 6.18.0-00002-gded0a4b9da5a #29 PREEMPT(voluntary) > [ 16.511883] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014 > [ 16.511887] Call Trace: > [ 16.511933] <IRQ> > [ 16.511938] dump_stack_lvl+0x4d/0x70 > [ 16.511973] __report_bad_irq+0x30/0xb7 > [ 16.511986] note_interrupt.cold+0x28/0x66 > [ 16.511988] handle_irq_event+0x6d/0x70 > [ 16.512004] handle_fasteoi_irq+0xd5/0x1f0 > [ 16.512011] __common_interrupt+0x3f/0xd0 > [ 16.512023] ? tick_nohz_irq_exit+0x2e/0x60 > [ 16.512035] common_interrupt+0x3b/0x90 > [ 16.512057] asm_common_interrupt+0x26/0x40 > [ 16.512073] RIP: 0010:handle_softirqs+0x6d/0x270 > [ 16.512081] Code: 02 00 01 00 00 89 5c 24 14 48 89 6c 24 08 c7 44 24 10 0a > 00 00 00 89 7c 24 04 31 c0 65 66 89 05 01 ce 3e 02 fb bb ff ff ff ff <49> c7 > c2 c0 80 a0 a2 44 89 ed 41 0f bc dd 83 c3 01 74 76 8d 43 ff > [ 16.512082] RSP: 0018:ffffb22480003f98 EFLAGS: 00000246 > [ 16.512086] RAX: 0000000000000000 RBX: 00000000ffffffff RCX: > 0000000000000838 > [ 16.512087] RDX: 0000000000000000 RSI: ffffffffa2a0e940 RDI: > 0000000000000000 > [ 16.512088] RBP: 00000000fffbac3e R08: 0000000000000001 R09: > 0000000000000000 > [ 16.512088] R10: ffffa2d57da249d0 R11: ffffb22480003ff8 R12: > 0000000000000000 > [ 16.512091] R13: 0000000000000082 R14: 0000000000000000 R15: > 0000000000000000 > [ 16.512097] irq_exit_rcu+0x89/0xb0 > [ 16.512099] sysvec_apic_timer_interrupt+0x6b/0x80 > [ 16.512103] </IRQ> > [ 16.512104] <TASK> > [ 16.512104] asm_sysvec_apic_timer_interrupt+0x1a/0x20 > [ 16.512105] RIP: 0010:pv_native_safe_halt+0xf/0x20 > [ 16.512107] Code: 2c 81 00 c3 cc cc cc cc 0f 1f 00 90 90 90 90 90 90 90 90 > 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d 05 cf 18 00 fb f4 <c3> cc > cc cc cc 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90 > [ 16.512109] RSP: 0018:ffffffffa2a03e80 EFLAGS: 00000212 > [ 16.512110] RAX: ffffa2d5da523000 RBX: ffffffffa2a0e940 RCX: > 0000000000000838 > [ 16.512111] RDX: 4000000000000000 RSI: 0000000000000087 RDI: > 00000000000a722c > [ 16.512111] RBP: 0000000000000000 R08: 00000000000a722c R09: > ffffa2d57da249d0 > [ 16.512112] R10: ffffa2d57da1bac0 R11: 0000000000000001 R12: > 0000000000000000 > [ 16.512112] R13: 0000000000000000 R14: 0000000000000000 R15: > 0000000000014770 > [ 16.512113] default_idle+0x9/0x10 > [ 16.512117] default_idle_call+0x2a/0xf0 > [ 16.512119] do_idle+0x1cb/0x230 > [ 16.512129] cpu_startup_entry+0x24/0x30 > [ 16.512130] rest_init+0xbc/0xc0 > [ 16.512133] start_kernel+0x6d7/0x6e0 > [ 16.512164] x86_64_start_reservations+0x24/0x30 > [ 16.512172] x86_64_start_kernel+0xc8/0xd0 > [ 16.512173] common_startup_64+0x13e/0x148 > [ 16.512181] </TASK> > [ 16.512181] handlers: > [ 16.513166] [<00000000b61218c7>] vp_interrupt > [ 16.515096] Disabling IRQ #11 > > with IRQ#11 being: > > # cat /proc/interrupts > CPU0 CPU1 > 11: 102218 0 IO-APIC 11-fasteoi virtio0 > > > This cannot be reproduced with the old version of vhost-device. I think > it is due to the removal of the req_rx_buf variable and associated > logic: a vq kick is now being performed at every cycle of the event > loop, even if no processing happened. At guest side, this results in a > IRQ not cared for. >
Thanks Francesco! right, I made a mistake. process_rx_queue() is only invoking signal_used_queue() when req_rx_buf is true, i.e., the device has actually added something to the used ring. I'll fix that. Matias
