On Thu, Jan 07, 2021 at 07:59:37PM -0800, Saeed Mahameed wrote: > On Thu, 2021-01-07 at 13:58 +0100, Eric Dumazet wrote: > > On Thu, Jan 7, 2021 at 12:33 PM Vladimir Oltean <olte...@gmail.com> > > wrote: > > > On Thu, Jan 07, 2021 at 12:18:28PM +0100, Eric Dumazet wrote: > > > > What a mess really. > > > > > > Thanks, that's at least _some_ feedback :) > > > > Yeah, I was on PTO for the last two weeks. > > > > > > You chose to keep the assumption that ndo_get_stats() would not > > > > fail, > > > > since we were providing the needed storage from callers. > > > > > > > > If ndo_get_stats() are now allowed to sleep, and presumably > > > > allocate > > > > memory, we need to make sure > > > > we report potential errors back to the user. > > > > > > > > I think your patch series is mostly good, but I would prefer not > > > > hiding errors and always report them to user space. > > > > And no, netdev_err() is not appropriate, we do not want tools to > > > > look > > > > at syslog to guess something went wrong. > > > > > > Well, there are only 22 dev_get_stats callers in the kernel, so I > > > assume > > > that after the conversion to return void, I can do another > > > conversion to > > > return int, and then I can convert the ndo_get_stats64 method to > > > return > > > int too. I will keep the plain ndo_get_stats still void (no reason > > > not > > > to). > > > > > > > Last point about drivers having to go to slow path, talking to > > > > firmware : Make sure that malicious/innocent users > > > > reading /proc/net/dev from many threads in parallel wont brick > > > > these devices. > > > > > > > > Maybe they implicitly _relied_ on the fact that firmware was > > > > gently > > > > read every second and results were cached from a work queue or > > > > something. > > > > > > How? I don't understand how I can make sure of that. > > > > Your patches do not attempt to change these drivers, but I guess your > > cover letter might send to driver maintainers incentive to get rid of > > their > > logic, that is all. > > > > We might simply warn maintainers and ask them to test their future > > changes > > with tests using 1000 concurrent theads reading /proc/net/dev > > > > > There is an effort initiated by Jakub to standardize the ethtool > > > statistics. My objection was that you can't expect that to happen > > > unless > > > dev_get_stats is sleepable just like ethtool -S is. So I think the > > > same > > > reasoning should apply to ethtool -S too, really. > > > > I think we all agree on the principles, once we make sure to not > > add more pressure on RTNL. It seems you addressed our feedback, all > > is fine. > > > > Eric, about two years ago you were totally against sleeping in > ndo_get_stats, what happened ? :) > https://lore.kernel.org/netdev/4cc44e85-cb5e-502c-30f3-c6ea564fe...@gmail.com/
I believe that what is different this time is that DSA switches are typically connected over a slow and bottlenecked bus (so periodic driver-level readouts would only make things worse for phc2sys and such other latency-sensitive programs), plus they are offloading interfaces for forwarding (so software-based counters could never be accurate). Support those, and supporting firmware-based high-speed devices will come as a nice side-effect. FWIW that discussion took place here: https://patchwork.ozlabs.org/project/netdev/patch/20201125193740.36825-3-george.mccollis...@gmail.com/ > My approach to solve this was much simpler and didn't require a new > mutex nor RTNL lock, all i did is to reduce the rcu critical section to > not include the call to the driver by simply holding the netdev via > dev_hold() I feel this is a call for the bonding maintainers to make. If they're willing to replace rtnl_dereference with bond_dereference throughout the whole driver, and reduce other guys' amount of work when other NDOs start losing the rtnl_mutex too, then I can't see what's wrong with my approach (despite not being "as simple"). If they think that update-side protection of the slaves array is just fine the way it is, then I suppose that RCU protection + dev_hold is indeed all that I can do.