As I said this is hard to test explicitly. The issue is between instantiating and destroying nwfilters. So to stress the area of the code I created a test:
1. run a guest (bionic in my case, but it doesn't matter) 2. prep hotpluggable interfaces with deep nwfilter rules Clean-traffic references some subrules and is available by default, so use # cat net-1.xml <interface type='network'> <mac address='de:ad:be:ef:00:01'/> <source network='default' bridge='virbr0'/> <model type='virtio'/> <filterref filter='clean-traffic'/> </interface> Spread out this rule by # for i in $(seq 2 9); do sed -e "s/01/0$i/g" net-1.xml > net-$i.xml ; don 3. attach all of them and destroy the guest $ for i in $(seq 1 9); do virsh attach-device b-test net-$i.xml; done; echo "date"; virsh destroy b-test; Note: I had concurrent attach/detach loops, but that was (independent to the bug here) too much as it only led to timeouts on cleanup, so it was not helping to test this. This locks it up in: # cat /proc/12531/wchan futex_wait_queue_me BT looks just as the bug described: Thread 10 (Thread 0x7fee6a9ef700 (LWP 12540)): #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007fee798b5e42 in __GI___pthread_mutex_lock (mutex=0x7fee00000a30) at ../nptl/pthread_mutex_lock.c:115 #2 0x00007fee79f93c65 in virMutexLock (m=<optimized out>) at ../../../src/util/virthread.c:89 #3 0x00007fee5331e62f in virNWFilterLockIface (ifname=ifname@entry=0x7fee440068a0 "vnet1") at ../../../src/nwfilter/nwfilter_learnipaddr.c:180 #4 0x00007fee53312bf5 in _virNWFilterTeardownFilter (ifname=0x7fee440068a0 "vnet1") at ../../../src/nwfilter/nwfilter_gentech_driver.c:1086 #5 0x00007fee5331384c in virNWFilterTeardownFilter (net=0x7fee44000b50) at ../../../src/nwfilter/nwfilter_gentech_driver.c:1104 #6 0x00007fee79fdbf0c in virDomainConfVMNWFilterTeardown (vm=vm@entry=0x7fee300009f0) at ../../../src/conf/domain_nwfilter.c:64 #7 0x00007fee52979a86 in qemuProcessStop (driver=driver@entry=0x7fee483a53c0, vm=0x7fee300009f0, reason=reason@entry=VIR_DOMAIN_SHUTOFF_DESTROYED, flags=0) at ../../../src/qemu/qemu_process.c:5290 #8 0x00007fee529c4577 in qemuDomainDestroyFlags (dom=<optimized out>, flags=0) ---Type <return> to continue, or q <return> to quit--- at ../../../src/qemu/qemu_driver.c:2228 #9 0x00007fee7a02e7cf in virDomainDestroy (domain=domain@entry=0x7fee28000c50) at ../../../src/libvirt-domain.c:479 Thread 18 (Thread 0x7fee1cff9700 (LWP 14653)): #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007fee798b5dbd in __GI___pthread_mutex_lock (mutex=mutex@entry=0x7fee7a455740 <ruleLock>) at ../nptl/pthread_mutex_lock.c:80 #2 0x00007fee79f93c65 in virMutexLock (m=m@entry=0x7fee7a455740 <ruleLock>) at ../../../src/util/virthread.c:89 #3 0x00007fee79f5991d in virFirewallApply (firewall=firewall@entry=0x7fee00000ac0) at ../../../src/util/virfirewall.c:933 #4 0x00007fee5331e4a5 in ebtablesApplyDropAllRules (ifname=0x7fee44000de8 "vnet1") at ../../../src/nwfilter/nwfilter_ebiptables_driver.c:3107 #5 0x00007fee5331eb2a in learnIPAddressThread (arg=0x7fee44000de0) at ../../../src/nwfilter/nwfilter_learnipaddr.c:638 #6 0x00007fee79f93b12 in virThreadHelper (data=<optimized out>) at ../../../src/util/virthread.c:206 #7 0x00007fee798b36ba in start_thread (arg=0x7fee1cff9700) at pthread_create.c:333 ---Type <return> to continue, or q <return> to quit--- #8 0x00007fee795e941d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 The above triggered in 3/3 cases for single and multiple devices. Switching to 1.3.1-1ubuntu10.21 in proposed has resolved the issue. Now take all this with a grain of salt. The reported error is gone by the backport - which was a deadlock between virNWFilterTeardownFilter vs learnIPAddressThread. But if I run the same test with more devices the deadlock still seems to exist. In multiple runs it seems even with one dev it can occur sometimes, but not anymore always as it was before. So there might be some deeper fault that still needs to be resolved - OTOH my test is rather artificial. TL;DR: - for the requested backport itself, no regression found and reported issue fixed - in a more complex related test similar issue still triggers That said: 1. setting verification-done 2. @The Reporter you might want to consider tracking down more changes if (for your real case) you happen to need more than the reported fix as well - if you do so please open a new bug (feel free to refer to this one here then, but lets use a new bug then to separate the fixes). ** Tags removed: verification-needed verification-needed-xenial ** Tags added: verification-done verification-done-xenial -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1753604 Title: libvirt-bin nwfilter deadlock To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1753604/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs