Hi, (first reported this via other channels and was asked to repost here)
One of our Proxmox VE users reported that when hitting CTRL+C on a running `ovs-tcpdump` on a host with VM workloads, there is a low (<1/50) chance of a complete OVS network freeze (i.e., the host loses network connection on all OVS-controlled interfaces). Soft lockups are logged, and they need to hard-reset the host. I believe I can reproduce the same issue in an Arch Linux VM (specs [1]) with the following steps: - set up an OVS bridge with an active-backup bond0 and assign an IP [2] - install iperf2 and run a script [3] that spawns an iperf server, then repeatedly performs the following: 1. start ovs-tcpdump on bond0, which spawns an "inner" tcpdump 2. concurrently, start ordinary tcpdump on mibond0 (the interface created by ovs-tcpdump) 3. send SIGINT to the inner tcpdump from step 1. The tcpdump from step 2 exits with "tcpdump: pcap_loop: The interface disappeared". - run long-running parallel iperf2 against bond0 from outside [4]. I'm running the iperf from the hypervisor, achieving a cumulative bandwidth of 55-60Gbit/s. - after 5-10min the host usually becomes unreachable via the bond (iperf reports zero bandwith), and a soft lockup is logged in the host [5] Note that the user who originally reported this only starts one ovs-tcpdump process -- so there is probably some other unidentified factor that makes the issue more likely to trigger on their host. The symptoms sound similar to the ones described in the kernel patch "net: openvswitch: fix race on port output" [6], but as far as I can tell, this patch is already contained in 6.12. Thanks, Friedrich [1] - Hypervisor is Proxmox VE 8.3 (QEMU/KVM) - VM has 4 cores, 8G RAM, 3x virtio-net NICs (one for management, two for bond0) - VM is running Arch Linux with kernel 6.12.8.arch1-1 (but I can also reproduce with a build of 6.13rc6), and openvswitch 3.4.1 (custom built package): $ cat /proc/version Linux version 6.12.8-arch1-1 (linux@archlinux) (gcc (GCC) 14.2.1 20240910, GNU ld (GNU Binutils) 2.43.0) #1 SMP PREEMPT_DYNAMIC Thu, 02 Jan 2025 22:52:26 +0000 $ ovs-vswitchd --version ovs-vswitchd (Open vSwitch) 3.4.1 [2] # inside the VM ovs-vsctl add-br br0 ip l set eth1 up ip l set eth2 up ovs-vsctl add-bond br0 bond0 eth1 eth2 ip l set br0 up ip addr add 10.2.1.104/16 dev br0 [3] # inside the VM iperf -s& while true; do PYTHONPATH=/usr/share/openvswitch/python/ ovs-tcpdump -env -i bond0 tcp port 12345 & sleep 2 pid=$(pidof tcpdump) echo pid: $pid tcpdump -envi mibond0 tcp port 12345 & sleep 5 echo kill kill -INT $pid sleep 3 done [4] # from outside iperf -c 10.2.1.104 -i1 -t 600 -P128 [5] Jan 08 09:01:57 arch-ovs ovs-vswitchd[446]: ovs|00074|bridge|INFO|bridge br0: added interface mibond0 on port 13 Jan 08 09:01:57 arch-ovs ovs-vswitchd[446]: 2025-01-08T09:01:57Z|00074|bridge|INFO|bridge br0: added interface mibond0 on port 13 Jan 08 09:01:57 arch-ovs kernel: mibond0: entered promiscuous mode Jan 08 09:01:57 arch-ovs kernel: audit: type=1700 audit(1736326917.773:304): dev=mibond0 prom=256 old_prom=0 auid=4294967295 uid=0 gid=0 ses=4294967295 Jan 08 09:01:57 arch-ovs kernel: audit: type=1300 audit(1736326917.773:304): arch=c000003e syscall=46 success=yes exit=52 a0=f a1=7ffcbc8bb550 a2=0 a3=55ab1849dd00 items=0 ppid=1 pid=446 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="ovs-vswitchd" exe="/usr/bin/ovs-vswitchd" key=(null) Jan 08 09:01:57 arch-ovs kernel: audit: type=1327 audit(1736326917.773:304): proctitle=2F7573722F7362696E2F6F76732D7673776974636864002D2D70696466696C653D2F72756E2F6F70656E767377697463682F6F76732D76737769746368642E706964 Jan 08 09:02:04 arch-ovs systemd-networkd[479]: mibond0: Link DOWN Jan 08 09:02:04 arch-ovs kernel: mibond0 (unregistering): left promiscuous mode Jan 08 09:02:04 arch-ovs kernel: audit: type=1700 audit(1736326924.733:305): dev=mibond0 prom=0 old_prom=256 auid=1000 uid=0 gid=0 ses=3 Jan 08 09:02:04 arch-ovs kernel: mibond0 selects TX queue 0, but real number of TX queues is 0 Jan 08 09:02:04 arch-ovs audit: ANOM_PROMISCUOUS dev=mibond0 prom=0 old_prom=256 auid=1000 uid=0 gid=0 ses=3 Jan 08 09:02:04 arch-ovs systemd-networkd[479]: mibond0: Lost carrier Jan 08 09:02:30 arch-ovs kernel: watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [swapper/1:0] Jan 08 09:02:30 arch-ovs kernel: CPU#1 Utilization every 4s during lockup: Jan 08 09:02:30 arch-ovs kernel: #1: 0% system, 100% softirq, 0% hardirq, 0% idle Jan 08 09:02:30 arch-ovs kernel: #2: 0% system, 101% softirq, 0% hardirq, 0% idle Jan 08 09:02:30 arch-ovs kernel: #3: 0% system, 100% softirq, 0% hardirq, 0% idle Jan 08 09:02:30 arch-ovs kernel: #4: 0% system, 100% softirq, 1% hardirq, 0% idle Jan 08 09:02:30 arch-ovs kernel: #5: 0% system, 101% softirq, 0% hardirq, 0% idle Jan 08 09:02:30 arch-ovs kernel: Modules linked in: dummy cfg80211 rfkill isofs nfnetlink_cttimeout openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 psample vfat fat sha512_ssse3 sha1_ssse3 aesni_intel gf128mul crypto_simd cryptd psmouse pcspkr i2c_piix4 joydev i2c_smbus mousedev mac_hid loop dm_mod nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock vmw_vmci qemu_fw_cfg ip_tables x_tables btrfs hid_generic blake2b_generic libcrc32c usbhid crc32c_generic xor raid6_pq sr_mod cdrom bochs serio_raw ata_generic drm_vram_helper atkbd pata_acpi drm_ttm_helper libps2 crc32c_intel vivaldi_fmap intel_agp ttm sha256_ssse3 ata_piix virtio_net virtio_balloon net_failover failover virtio_scsi intel_gtt i8042 floppy serio Jan 08 09:02:30 arch-ovs kernel: CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.12.8-arch1-1 #1 099de49ddaebb26408f097c48b36e50b2c8e21c9 Jan 08 09:02:30 arch-ovs kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 Jan 08 09:02:30 arch-ovs kernel: RIP: 0010:netdev_pick_tx+0x267/0x2b0 Jan 08 09:02:30 arch-ovs kernel: Code: c2 48 c1 e8 20 44 01 f0 44 0f b7 f0 e9 c7 fe ff ff e8 ad 7d 5d ff 44 0f b6 04 24 e9 08 fe ff ff 45 89 c8 e9 6e fe ff ff 29 c2 <39> c2 72 89 eb f8 48 85 db 41 bf ff ff ff ff 49 0f 44 dc e9 dd fd Jan 08 09:02:30 arch-ovs kernel: RSP: 0018:ffffa32280120648 EFLAGS: 00000246 Jan 08 09:02:30 arch-ovs kernel: RAX: 0000000000000000 RBX: ffff96d343cd2000 RCX: 00000000000005a8 Jan 08 09:02:30 arch-ovs kernel: RDX: 0000000000000000 RSI: ffff96d342208900 RDI: ffff96d340364c80 Jan 08 09:02:30 arch-ovs kernel: RBP: ffff96d342208900 R08: 0000000000000000 R09: 0000000000000000 Jan 08 09:02:30 arch-ovs kernel: R10: ffff96d342208900 R11: ffffa322801208b0 R12: ffff96d343cd2000 Jan 08 09:02:30 arch-ovs kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 00000000ffffffff Jan 08 09:02:30 arch-ovs kernel: FS: 0000000000000000(0000) GS:ffff96d477c80000(0000) knlGS:0000000000000000 Jan 08 09:02:30 arch-ovs kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jan 08 09:02:30 arch-ovs kernel: CR2: 000071216853a120 CR3: 0000000131ce4000 CR4: 00000000000006f0 Jan 08 09:02:30 arch-ovs kernel: Call Trace: Jan 08 09:02:30 arch-ovs kernel: <IRQ> Jan 08 09:02:30 arch-ovs kernel: ? watchdog_timer_fn.cold+0x19c/0x219 Jan 08 09:02:30 arch-ovs kernel: ? __pfx_watchdog_timer_fn+0x10/0x10 Jan 08 09:02:30 arch-ovs kernel: ? __hrtimer_run_queues+0x132/0x2a0 Jan 08 09:02:30 arch-ovs kernel: ? hrtimer_interrupt+0xfa/0x210 Jan 08 09:02:30 arch-ovs kernel: ? __sysvec_apic_timer_interrupt+0x55/0x100 Jan 08 09:02:30 arch-ovs kernel: ? sysvec_apic_timer_interrupt+0x38/0x90 Jan 08 09:02:30 arch-ovs kernel: ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 Jan 08 09:02:30 arch-ovs kernel: ? netdev_pick_tx+0x267/0x2b0 Jan 08 09:02:30 arch-ovs kernel: ? netdev_pick_tx+0x253/0x2b0 Jan 08 09:02:30 arch-ovs kernel: netdev_core_pick_tx+0xa1/0xb0 Jan 08 09:02:30 arch-ovs kernel: __dev_queue_xmit+0x19d/0xe70 Jan 08 09:02:30 arch-ovs kernel: ? kmem_cache_alloc_noprof+0x111/0x2f0 Jan 08 09:02:30 arch-ovs kernel: do_execute_actions+0xce/0x1b70 [openvswitch d139b1adcdbcdfb64274f88696adfd125a3e2f3c] Jan 08 09:02:30 arch-ovs kernel: ? flow_lookup.isra.0+0x58/0x100 [openvswitch d139b1adcdbcdfb64274f88696adfd125a3e2f3c] Jan 08 09:02:30 arch-ovs kernel: ovs_execute_actions+0x4c/0x130 [openvswitch d139b1adcdbcdfb64274f88696adfd125a3e2f3c] Jan 08 09:02:30 arch-ovs kernel: ovs_dp_process_packet+0xa6/0x220 [openvswitch d139b1adcdbcdfb64274f88696adfd125a3e2f3c] Jan 08 09:02:30 arch-ovs kernel: ? __pfx_netdev_frame_hook+0x10/0x10 [openvswitch d139b1adcdbcdfb64274f88696adfd125a3e2f3c] Jan 08 09:02:30 arch-ovs kernel: ? __pfx_netdev_frame_hook+0x10/0x10 [openvswitch d139b1adcdbcdfb64274f88696adfd125a3e2f3c] Jan 08 09:02:30 arch-ovs kernel: ovs_vport_receive+0x84/0xe0 [openvswitch d139b1adcdbcdfb64274f88696adfd125a3e2f3c] Jan 08 09:02:30 arch-ovs kernel: netdev_frame_hook+0xd9/0x1a0 [openvswitch d139b1adcdbcdfb64274f88696adfd125a3e2f3c] Jan 08 09:02:30 arch-ovs kernel: __netif_receive_skb_core.constprop.0+0x1fa/0x10b0 Jan 08 09:02:30 arch-ovs kernel: __netif_receive_skb_list_core+0x15d/0x300 Jan 08 09:02:30 arch-ovs kernel: netif_receive_skb_list_internal+0x1d4/0x310 Jan 08 09:02:30 arch-ovs kernel: napi_complete_done+0x72/0x220 Jan 08 09:02:30 arch-ovs kernel: virtnet_poll+0x6da/0xe62 [virtio_net ba458d10bdb47849f4b70ed392bbaae27b08be62] Jan 08 09:02:30 arch-ovs kernel: ? free_unref_page_commit+0x169/0x2e0 Jan 08 09:02:30 arch-ovs kernel: ? enqueue_hrtimer+0x35/0x90 Jan 08 09:02:30 arch-ovs kernel: __napi_poll+0x2b/0x160 Jan 08 09:02:30 arch-ovs kernel: net_rx_action+0x349/0x3e0 Jan 08 09:02:30 arch-ovs kernel: handle_softirqs+0xe4/0x2a0 Jan 08 09:02:30 arch-ovs kernel: __irq_exit_rcu+0x97/0xb0 Jan 08 09:02:30 arch-ovs kernel: common_interrupt+0x85/0xa0 Jan 08 09:02:30 arch-ovs kernel: </IRQ> Jan 08 09:02:30 arch-ovs kernel: <TASK> Jan 08 09:02:30 arch-ovs kernel: asm_common_interrupt+0x26/0x40 Jan 08 09:02:30 arch-ovs kernel: RIP: 0010:finish_task_switch.isra.0+0x9f/0x2e0 Jan 08 09:02:30 arch-ovs kernel: Code: 34 00 00 00 00 0f 1f 44 00 00 4c 8b bb d8 0c 00 00 4d 85 ff 0f 85 b7 00 00 00 66 90 48 89 df e8 27 e8 db 00 fb 0f 1f 44 00 00 <66> 90 4d 85 ed 74 18 4d 3b ae 40 0a 00 00 0f 84 50 01 00 00 f0 41 Jan 08 09:02:30 arch-ovs kernel: RSP: 0018:ffffa322800d3e18 EFLAGS: 00000282 Jan 08 09:02:30 arch-ovs kernel: RAX: 0000000000000001 RBX: ffff96d477cb6c80 RCX: 0000000000000002 Jan 08 09:02:30 arch-ovs kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff96d477cb6c80 Jan 08 09:02:30 arch-ovs kernel: RBP: ffffa322800d3e48 R08: 0000000000000001 R09: 0000000000000000 Jan 08 09:02:30 arch-ovs kernel: R10: 0000000000000001 R11: 0000000000000000 R12: ffff96d350b04c80 Jan 08 09:02:30 arch-ovs kernel: R13: 0000000000000000 R14: ffff96d340364c80 R15: 0000000000000000 Jan 08 09:02:30 arch-ovs kernel: ? finish_task_switch.isra.0+0x99/0x2e0 Jan 08 09:02:30 arch-ovs kernel: __schedule+0x3b8/0x12b0 Jan 08 09:02:30 arch-ovs kernel: ? pv_native_safe_halt+0xf/0x20 Jan 08 09:02:30 arch-ovs kernel: schedule_idle+0x23/0x40 Jan 08 09:02:30 arch-ovs kernel: cpu_startup_entry+0x29/0x30 Jan 08 09:02:30 arch-ovs kernel: start_secondary+0x11e/0x140 Jan 08 09:02:30 arch-ovs kernel: common_startup_64+0x13e/0x141 Jan 08 09:02:30 arch-ovs kernel: </TASK> [6] https://lore.kernel.org/lkml/ZC0pBXBAgh7c76CA@kernel-bug-kernel-bug/ _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss