On Fri, 2017-01-06 at 13:01 -0500, David Miller wrote: > From: Eric Dumazet <eric.duma...@gmail.com> > Date: Fri, 06 Jan 2017 09:32:56 -0800 > > > This makes no sense to me. > > > > RTNL is absolutely not needed to get device stats. > > > > We try to not add RTNL, especially when not required. > > > > Sure, RTNETLINK dumps currently hold RTNL, but we had various attempts > > in the past to get rid of this behavior. > > > > If a device driver expects RTNL being locked, it is clearly a bug that > > needs a fix anyway. > > This is extremely problematic when the driver has to synchronize some > piece of state between the get stats method and open/close. It is > exactly the case we are trying to solve in tg3, and lots of drivers > end up hitting the same exact issue. > > If open/close can happen asynchronously to get stats, it is very hard > to make dynamically allocated data structures or DMA buffers usable > from the stats call.
Yes, I had some issues lately with mlx4. netdevices are protected by RCU, adding proper RCU logic for the stats is doable. > > Drivers in this situation will just add a mutex specifically for this > situation if we don't consistently apply RTNL locking here. Well, there are cases where RTNL is quite contended, but supervisions like to get /proc/net/devices or various sysfs attributes (netstat_show() can be called very very often for /sys/class/net/*/statistics/*) in a reasonable amount of time. I fear that such a change will add drifts, when devices are constantly added/removed.