On Friday 09/05 at 10:25 -0700, Breno Leitao wrote: > commit efa95b01da18 ("netpoll: fix use after free") incorrectly > ignored the refcount and prematurely set dev->npinfo to NULL during > netpoll cleanup, leading to improper behavior and memory leaks. > > Scenario causing lack of proper cleanup: > > 1) A netpoll is associated with a NIC (e.g., eth0) and netdev->npinfo is > allocated, and refcnt = 1 > - Keep in mind that npinfo is shared among all netpoll instances. In > this case, there is just one. > > 2) Another netpoll is also associated with the same NIC and > npinfo->refcnt += 1. > - Now dev->npinfo->refcnt = 2; > - There is just one npinfo associated to the netdev. > > 3) When the first netpolls goes to clean up: > - The first cleanup succeeds and clears np->dev->npinfo, ignoring > refcnt. > - It basically calls `RCU_INIT_POINTER(np->dev->npinfo, NULL);` > - Set dev->npinfo = NULL, without proper cleanup > - No ->ndo_netpoll_cleanup() is either called > > 4) Now the second target tries to clean up > - The second cleanup fails because np->dev->npinfo is already NULL. > * In this case, ops->ndo_netpoll_cleanup() was never called, and > the skb pool is not cleaned as well (for the second netpoll > instance) > - This leaks npinfo and skbpool skbs, which is clearly reported by > kmemleak. > > Revert commit efa95b01da18 ("netpoll: fix use after free") and adds > clarifying comments emphasizing that npinfo cleanup should only happen > once the refcount reaches zero, ensuring stable and correct netpoll > behavior.
This makes sense to me. Just curious, did you try the original OOPS reproducer? https://lore.kernel.org/lkml/96b940137a50e5c387687bb4f57de8b0435a653f.1404857349.git.de...@googlers.com/ I wonder if there might be a demon lurking in bonding+netpoll that this was papering over? Not a reason not to fix the leaks IMO, I'm just curious, I don't want to spend time on it if you already did :) The discussion on v1 isn't enlightening either: https://lore.kernel.org/lkml/0f692012238337f2c40893319830ae042523ce18.1404172155.git.de...@googlers.com/ Thanks, Calvin > Cc: sta...@vger.kernel.org > Cc: j...@jvosburgh.net > Fixes: efa95b01da18 ("netpoll: fix use after free") > Signed-off-by: Breno Leitao <lei...@debian.org> > --- > net/core/netpoll.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/net/core/netpoll.c b/net/core/netpoll.c > index 5f65b62346d4e..19676cd379640 100644 > --- a/net/core/netpoll.c > +++ b/net/core/netpoll.c > @@ -815,6 +815,10 @@ static void __netpoll_cleanup(struct netpoll *np) > if (!npinfo) > return; > > + /* At this point, there is a single npinfo instance per netdevice, and > + * its refcnt tracks how many netpoll structures are linked to it. We > + * only perform npinfo cleanup when the refcnt decrements to zero. > + */ > if (refcount_dec_and_test(&npinfo->refcnt)) { > const struct net_device_ops *ops; > > @@ -824,8 +828,7 @@ static void __netpoll_cleanup(struct netpoll *np) > > RCU_INIT_POINTER(np->dev->npinfo, NULL); > call_rcu(&npinfo->rcu, rcu_cleanup_netpoll_info); > - } else > - RCU_INIT_POINTER(np->dev->npinfo, NULL); > + } > > skb_pool_flush(np); > } > > -- > 2.47.3 >