Hello, On Fri, Jul 28, 2017 at 9:47 AM, Rolf Neugebauer <rolf.neugeba...@docker.com> wrote: > Creating the new namespace is stalling for around 200 seconds and > there 20 odd messages on the console, like: > > [ 67.372603] unregister_netdevice: waiting for lo to become free. > Usage count = 1 >
Sounds like another netdev refcnt leak. > Adding a 'sleep 1' before deleting the original network namespace > "solves" the issue, but that doesn't sound like a good fix. Not using > unmount also does not help (understandable). Interesting, if sleeping for 1sec help, why did you see the stall for 200sec? The "leak" should go away eventually without 'sleep 1', right? > > While the creation of the new namespace is stalled, I used 'sysrq' a > few times to dump the work queues. There is an example below. Also, > the hung task detection kicks in after 120 seconds (also below) Yeah, the net_mutex is held by cleanup_net(). > > I can readily reproduce this on 4.9.39, 4.11.12 and another user > repro-ed it on 4.12.3. It seems to happen every time. At least one > user reported issues with NFS mounts as well, but we were not able to > reproduce it. It's not clear to me if this is directly related to > 'mount.cifs' or if that just happens to reliably repro it. OK, so commit d747a7a51b00984127a88113c does not help this case either. > > It would be great if someone more familiar with the code could take a > look. I'm happy to provide additional info (perf traces etc) or test > patches if needed. > The last time I debugged this kind of netdev refcnt leak problem, I added a few trace_printk() to dev_hold() and dev_put(), so you can try it too. I will see if I can use your reproducer here. Thanks.