On Mon, Mar 25, 2019 at 11:55:54AM -0700, Dmitry Torokhov wrote: > On Mon, Mar 25, 2019 at 06:10:31PM +0200, Andy Shevchenko wrote: > > On Mon, Mar 25, 2019 at 11:20:01PM +0800, wanghai (M) wrote: > > > thanks , Can it be fixed like this? > > > > I dunno. I think no, it can't. > > I agree, it can't. > > > > > As far as I can see the issue happened due to freeing entire network device > > at > > the point of putting reference count to the device (struct device is > > embedded > > into struct net_device). > > > > When it happens the access to _any_ field of struct net_device will crash > > the > > system. > > > > Basically it means that put_device() should be carefully placed > > case-by-case, > > because on real hardware the actual device is parent and usually no-one does > > access to the child without need. On the contrary the tunX devices are > > artificial and are controlled by the network stack. > > > > So, it means we need to do something like > > > > ret = register_netdev(...); > > if (ret) { > > put_device(&ndev->dev); > > ... > > } > > > > But as I mentioned, it would be tricky to not break something else. > > I'd say that the entity that called alloc_netdev() should be the one > that calls put_device() (but the way of free_netdev()), not net/core > code. Do we have a driver that is messed up and does not do proper > cleanup?
OK, looking at this some more, I think we need to set dev->reg_state = NETREG_REGISTERED earlier, right after successful call to device_add() as at this point the device is alive as far as device core is concerned. The queue kobjects have to be managed separately, for that I'd pull the code out of netdev_register_kobject() and move it into register_netdevice() and ensure that we clean up there properly. Thanks. -- Dmitry