As I said this is hard to test explicitly.
The issue is between instantiating and destroying nwfilters.
So to stress the area of the code I created a test:

1. run a guest (bionic in my case, but it doesn't matter)

2. prep hotpluggable interfaces with deep nwfilter rules
   Clean-traffic references some subrules and is available by default, so use
   # cat net-1.xml
    <interface type='network'>
      <mac address='de:ad:be:ef:00:01'/>
      <source network='default' bridge='virbr0'/>
      <model type='virtio'/>
      <filterref filter='clean-traffic'/>
    </interface>
   Spread out this rule by
   # for i in $(seq 2 9); do sed -e "s/01/0$i/g" net-1.xml > net-$i.xml ; don

3. attach all of them and destroy the guest
  $ for i in $(seq 1 9); do virsh attach-device b-test net-$i.xml; done; echo 
"date"; virsh destroy b-test;

Note: I had concurrent attach/detach loops, but that was (independent to the 
bug here) too much as it only led to timeouts on cleanup, so it was not helping 
to test this.
   
This locks it up in:
# cat /proc/12531/wchan 
futex_wait_queue_me


BT looks just as the bug described:
Thread 10 (Thread 0x7fee6a9ef700 (LWP 12540)):
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007fee798b5e42 in __GI___pthread_mutex_lock (mutex=0x7fee00000a30) at 
../nptl/pthread_mutex_lock.c:115
#2  0x00007fee79f93c65 in virMutexLock (m=<optimized out>) at 
../../../src/util/virthread.c:89
#3  0x00007fee5331e62f in virNWFilterLockIface 
(ifname=ifname@entry=0x7fee440068a0 "vnet1")
    at ../../../src/nwfilter/nwfilter_learnipaddr.c:180
#4  0x00007fee53312bf5 in _virNWFilterTeardownFilter (ifname=0x7fee440068a0 
"vnet1")
    at ../../../src/nwfilter/nwfilter_gentech_driver.c:1086
#5  0x00007fee5331384c in virNWFilterTeardownFilter (net=0x7fee44000b50)
    at ../../../src/nwfilter/nwfilter_gentech_driver.c:1104
#6  0x00007fee79fdbf0c in virDomainConfVMNWFilterTeardown 
(vm=vm@entry=0x7fee300009f0)
    at ../../../src/conf/domain_nwfilter.c:64
#7  0x00007fee52979a86 in qemuProcessStop (driver=driver@entry=0x7fee483a53c0, 
vm=0x7fee300009f0, 
    reason=reason@entry=VIR_DOMAIN_SHUTOFF_DESTROYED, flags=0) at 
../../../src/qemu/qemu_process.c:5290
#8  0x00007fee529c4577 in qemuDomainDestroyFlags (dom=<optimized out>, flags=0)
---Type <return> to continue, or q <return> to quit---
    at ../../../src/qemu/qemu_driver.c:2228
#9  0x00007fee7a02e7cf in virDomainDestroy (domain=domain@entry=0x7fee28000c50) 
at ../../../src/libvirt-domain.c:479


Thread 18 (Thread 0x7fee1cff9700 (LWP 14653)):
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007fee798b5dbd in __GI___pthread_mutex_lock 
(mutex=mutex@entry=0x7fee7a455740 <ruleLock>)
    at ../nptl/pthread_mutex_lock.c:80
#2  0x00007fee79f93c65 in virMutexLock (m=m@entry=0x7fee7a455740 <ruleLock>) at 
../../../src/util/virthread.c:89
#3  0x00007fee79f5991d in virFirewallApply 
(firewall=firewall@entry=0x7fee00000ac0)
    at ../../../src/util/virfirewall.c:933
#4  0x00007fee5331e4a5 in ebtablesApplyDropAllRules (ifname=0x7fee44000de8 
"vnet1")
    at ../../../src/nwfilter/nwfilter_ebiptables_driver.c:3107
#5  0x00007fee5331eb2a in learnIPAddressThread (arg=0x7fee44000de0)
    at ../../../src/nwfilter/nwfilter_learnipaddr.c:638
#6  0x00007fee79f93b12 in virThreadHelper (data=<optimized out>) at 
../../../src/util/virthread.c:206
#7  0x00007fee798b36ba in start_thread (arg=0x7fee1cff9700) at 
pthread_create.c:333
---Type <return> to continue, or q <return> to quit---
#8  0x00007fee795e941d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

The above triggered in 3/3 cases for single and multiple devices.

Switching to 1.3.1-1ubuntu10.21 in proposed has resolved the issue.

Now take all this with a grain of salt.
The reported error is gone by the backport - which was a deadlock between 
virNWFilterTeardownFilter vs learnIPAddressThread.

But if I run the same test with more devices the deadlock still seems to exist.
In multiple runs it seems even with one dev it can occur sometimes, but not 
anymore always as it was before.
So there might be some deeper fault that still needs to be resolved - OTOH my 
test is rather artificial.

TL;DR:
- for the requested backport itself, no regression found and reported issue 
fixed
- in a more complex related test similar issue still triggers

That said:
1. setting verification-done
2. @The Reporter you might want to consider tracking down more changes if (for 
your real case) you happen to need more than the reported fix as well - if you 
do so please open a new bug (feel free to refer to this one here then, but lets 
use a new bug then to separate the fixes).

** Tags removed: verification-needed verification-needed-xenial
** Tags added: verification-done verification-done-xenial

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1753604

Title:
  libvirt-bin nwfilter deadlock

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1753604/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to