Hi Hannes, On Wed, Nov 16, 2016 at 08:43:25PM +0100, Hannes Frederic Sowa wrote: > On 16.11.2016 19:51, Ido Schimmel wrote: > > On Wed, Nov 16, 2016 at 06:35:45PM +0100, Hannes Frederic Sowa wrote: > >> On 16.11.2016 16:18, Ido Schimmel wrote: > >>> On Wed, Nov 16, 2016 at 03:51:01PM +0100, Hannes Frederic Sowa wrote: > >>>> On 16.11.2016 15:09, Jiri Pirko wrote: > >>>>> From: Ido Schimmel <ido...@mellanox.com> > >>>>> > >>>>> Commit b90eb7549499 ("fib: introduce FIB notification infrastructure") > >>>>> introduced a new notification chain to notify listeners (f.e., switchdev > >>>>> drivers) about addition and deletion of routes. > >>>>> > >>>>> However, upon registration to the chain the FIB tables can already be > >>>>> populated, which means potential listeners will have an incomplete view > >>>>> of the tables. > >>>>> > >>>>> Solve that by adding an API to request a FIB dump. The dump itself it > >>>>> done using RCU in order not to starve consumers that need RTNL to make > >>>>> progress. > >>>>> > >>>>> Signed-off-by: Ido Schimmel <ido...@mellanox.com> > >>>>> Signed-off-by: Jiri Pirko <j...@mellanox.com> > >>>> > >>>> Have you looked at potential inconsistencies resulting of RCU walking > >>>> the table and having concurrent inserts? > >>> > >>> Yes. I did try to think about situations in which this approach will > >>> fail, but I could only find problems with concurrent removals, which I > >>> addressed in 5/8. In case of concurrent insertions, even if you missed > >>> the node, you would still get the ENTRY_ADD event to your listener. > >> > >> Theoretically a node could still be installed while the deletion event > >> fired before registering the notifier. E.g. a synchronize_net before > >> dumping could help here? > > > > If the deletion event fired for some fib alias, then by 5/8 we are > > guaranteed that it was already unlinked from the fib alias list in the > > leaf in which it was contained. So, while it's possible we didn't > > register our listener in time for the deletion event, we won't traverse > > this fib alias while dumping the trie anyway. Did I understand you > > correctly? > > > > Theoretically we can have the same problem for insertion: > > You receive a delete event from the notifier that is queued up first but > the dump will still see the entry in the fib due to being managed by RCU > (the notifier running on another CPU). > > The problem is that the fib_remove_alias->hlist_del_rcu->WRITE_ONCE is > still not strongly ordered against the local fib dump trie walk.
OK, so I believe my analysis in 5/8 was wrong. Despite CPU A invoking fib_remove_alias() in t0 for fa1, it's possible for CPU B doing the fib dump to see fa1 in t1, which will lead to fa1 being permanently present in the listener's table. Given the above, I think Dave's suggestion is the only applicable solution. Do you agree? Any other suggestions? Thanks