Hello,

On Fri, Jul 28, 2017 at 9:47 AM, Rolf Neugebauer
<rolf.neugeba...@docker.com> wrote:
> Creating the new namespace is stalling for around 200 seconds and
> there 20 odd messages on the console, like:
>
> [   67.372603] unregister_netdevice: waiting for lo to become free.
> Usage count = 1
>

Sounds like another netdev refcnt leak.

> Adding a 'sleep 1' before deleting the original network namespace
> "solves" the issue, but that doesn't sound like a good fix. Not using
> unmount also does not help (understandable).


Interesting, if sleeping for 1sec help, why did you see the stall for
200sec? The "leak" should go away eventually without 'sleep 1',
right?

>
> While the creation of the new namespace is stalled, I used 'sysrq' a
> few times to dump the work queues. There is an example below. Also,
> the hung task detection kicks in after 120 seconds (also below)

Yeah, the net_mutex is held by cleanup_net().

>
> I can readily reproduce this on 4.9.39, 4.11.12 and another user
> repro-ed it on 4.12.3. It seems to happen every time. At least one
> user reported issues with NFS mounts as well, but we were not able to
> reproduce it. It's not clear to me if this is directly related to
> 'mount.cifs' or if that just happens to reliably repro it.

OK, so commit d747a7a51b00984127a88113c does not help this case
either.

>
> It would be great if someone more familiar with the code could take a
> look. I'm happy to provide additional info (perf traces etc) or test
> patches if needed.
>

The last time I debugged this kind of netdev refcnt leak problem,
I added a few trace_printk() to dev_hold() and dev_put(),
so you can try it too. I will see if I can use your reproducer
here.

Thanks.

Reply via email to