On Wed, Jul 19, 2017 at 09:49:37AM -0600, David Ahern wrote: > On 7/19/17 1:02 AM, Jiri Pirko wrote: > > From: Ido Schimmel <ido...@mellanox.com> > > > > Listeners of the FIB notification chain are expected to be able to take > > and release a reference on notified IPv6 routes. This is needed in the > > case of drivers capable of offloading these routes to a capable device. > > > > Since notifications are sent in an atomic context, these drivers need to > > take a reference on the route, prepare a work item to offload the route > > and release the reference at the end of the work. > > > > Currently, rt6i_ref is used to indicate in how many FIB nodes a route > > appears. Different code paths rely on rt6i_ref being 0 to indicate the > > route is no longer used by the FIB. > > > > For example, whenever a route is deleted or replaced, fib6_purge_rt() is > > run to make sure the route is no longer present in intermediate nodes. A > > BUG_ON() at the end of the function is executed in case the reference > > count isn't 1, as it's only supposed to appear in the non-intermediate > > node from which it's going to be deleted. > > > > Instead of changing the semantics of rt6i_ref, a new reference count is > > added, so that external users could also take a reference on routes > > without modifying rt6i_ref. > > > > To make sure external users don't release routes used by the FIB, the > > reference count is set to 1 upon creation of a route and decremented by > > the FIB upon rt6_release(). > > > > The reference count is atomic, as it's not protected by any locks and > > placed in the 40 bytes hole after the existing rt6i_ref. > > I'd rather not add another reference counter. Debugging reference leaks > is a huge PITA now; adding another counter just makes it worse. > > Why can't the BUG_ON in fib6_purge_rt be removed since there are other > reference holders now?
I did exactly that in the beginning, but it didn't sit right with me for the exact reason you mentioned - it can be a PITA to debug. If we use rt6i_ref for something other than FIB references, then it breaks existing code that relies on rt6i_ref being 0 to indicate it's no longer used by the FIB. A non-zero value can now mean "not used by the FIB, but waiting for some module to drop the reference in its workqueue". The BUG_ON() mentioned in the commit message is just one example. Another check was added by you in commit 8048ced9b. So I think we both want the same thing, but I'm not sure how your approach is safer. Thanks