On Thu, 2021-01-07 at 13:58 +0100, Eric Dumazet wrote: > On Thu, Jan 7, 2021 at 12:33 PM Vladimir Oltean <olte...@gmail.com> > wrote: > > On Thu, Jan 07, 2021 at 12:18:28PM +0100, Eric Dumazet wrote: > > > What a mess really. > > > > Thanks, that's at least _some_ feedback :) > > Yeah, I was on PTO for the last two weeks. > > > > You chose to keep the assumption that ndo_get_stats() would not > > > fail, > > > since we were providing the needed storage from callers. > > > > > > If ndo_get_stats() are now allowed to sleep, and presumably > > > allocate > > > memory, we need to make sure > > > we report potential errors back to the user. > > > > > > I think your patch series is mostly good, but I would prefer not > > > hiding errors and always report them to user space. > > > And no, netdev_err() is not appropriate, we do not want tools to > > > look > > > at syslog to guess something went wrong. > > > > Well, there are only 22 dev_get_stats callers in the kernel, so I > > assume > > that after the conversion to return void, I can do another > > conversion to > > return int, and then I can convert the ndo_get_stats64 method to > > return > > int too. I will keep the plain ndo_get_stats still void (no reason > > not > > to). > > > > > Last point about drivers having to go to slow path, talking to > > > firmware : Make sure that malicious/innocent users > > > reading /proc/net/dev from many threads in parallel wont brick > > > these devices. > > > > > > Maybe they implicitly _relied_ on the fact that firmware was > > > gently > > > read every second and results were cached from a work queue or > > > something. > > > > How? I don't understand how I can make sure of that. > > Your patches do not attempt to change these drivers, but I guess your > cover letter might send to driver maintainers incentive to get rid of > their > logic, that is all. > > We might simply warn maintainers and ask them to test their future > changes > with tests using 1000 concurrent theads reading /proc/net/dev > > > There is an effort initiated by Jakub to standardize the ethtool > > statistics. My objection was that you can't expect that to happen > > unless > > dev_get_stats is sleepable just like ethtool -S is. So I think the > > same > > reasoning should apply to ethtool -S too, really. > > I think we all agree on the principles, once we make sure to not > add more pressure on RTNL. It seems you addressed our feedback, all > is fine. >
Eric, about two years ago you were totally against sleeping in ndo_get_stats, what happened ? :) https://lore.kernel.org/netdev/4cc44e85-cb5e-502c-30f3-c6ea564fe...@gmail.com/ My approach to solve this was much simpler and didn't require a new mutex nor RTNL lock, all i did is to reduce the rcu critical section to not include the call to the driver by simply holding the netdev via dev_hold()