Hello, The kernelci folks pointed out that a Samsung Exynos based board was failing to boot when trying to mount the rootfs via NFS, due a networking issue [0].
I looked at the issue and it turned out to be a race between ip_auto_config() and register_netdev() when using the ip=dhcp param in the kernel command line. The problem is that ip_auto_config() calls wait_for_devices() [1] and returns as soon as it finds a network device registered. Then ic_open_devs() [2] is called then to bring the network devs up and wait for their carrier signals. But ic_open_devs() grabs the rtnl_mutex lock [3] when doing this, which is the same lock that register_netdev() [4] grabs before registering a network device. And so if a network dev is found and wait_for_devices() returns, ic_open_devs() will be called and no new network dev could be registered in the meantime. So since ic_open_devs() waits up to CONF_CARRIER_TIMEOUT (120 secs) with this lock held, if the network dev that's supposed to get its IP over DHCP isn't the first to be registered, the boot test job may timeout and be considered a fail. A workaround is to use ip=:::::eth0:dhcp instead ip=dhcp, so wait_for_devices() waits for this specific device. Another workaround is to increase the timeout for the job to be much bigger than CONF_CARRIER_TIMEOUT so ip_auto_config() can retry and the network devices can be registered between tries. But I wonder if someone can suggest a proper way to fix this. Grabbing a mutex that prevents network devs to be registered for 120 secs doesn't sound correct. Thanks a lot for your help and please let me know if I misunderstood something. [0]: https://storage.kernelci.org/mainline/v4.9/arm-exynos_defconfig/lab-collabora/boot-exynos5422-odroidxu3_rootfs:nfs.html [1]: http://lxr.free-electrons.com/source/net/ipv4/ipconfig.c#L1368 [2]: http://lxr.free-electrons.com/source/net/ipv4/ipconfig.c#L202 [3]: http://lxr.free-electrons.com/source/net/core/rtnetlink.c#L68 [4]: http://lxr.free-electrons.com/source/net/core/dev.c#L7326 Best regards, -- Javier Martinez Canillas Open Source Group Samsung Research America