From: Wei Wang <wei...@google.com> Date: Wed, 16 Oct 2019 12:03:15 -0700
> Jesse and Ido reported the following race condition: > <CPU A, t0> - Received packet A is forwarded and cached dst entry is > taken from the nexthop ('nhc->nhc_rth_input'). Calls skb_dst_set() > > <t1> - Given Jesse has busy routers ("ingesting full BGP routing tables > from multiple ISPs"), route is added / deleted and rt_cache_flush() is > called > > <CPU B, t2> - Received packet B tries to use the same cached dst entry > from t0, but rt_cache_valid() is no longer true and it is replaced in > rt_cache_route() by the newer one. This calls dst_dev_put() on the > original dst entry which assigns the blackhole netdev to 'dst->dev' > > <CPU A, t3> - dst_input(skb) is called on packet A and it is dropped due > to 'dst->dev' being the blackhole netdev > > There are 2 issues in the v4 routing code: > 1. A per-netns counter is used to do the validation of the route. That > means whenever a route is changed in the netns, users of all routes in > the netns needs to redo lookup. v6 has an implementation of only > updating fn_sernum for routes that are affected. > 2. When rt_cache_valid() returns false, rt_cache_route() is called to > throw away the current cache, and create a new one. This seems > unnecessary because as long as this route does not change, the route > cache does not need to be recreated. > > To fully solve the above 2 issues, it probably needs quite some code > changes and requires careful testing, and does not suite for net branch. > > So this patch only tries to add the deleted cached rt into the uncached > list, so user could still be able to use it to receive packets until > it's done. > > Fixes: 95c47f9cf5e0 ("ipv4: call dst_dev_put() properly") > Signed-off-by: Wei Wang <wei...@google.com> > Reported-by: Ido Schimmel <ido...@idosch.org> > Reported-by: Jesse Hathaway <je...@mbuki-mvuki.org> > Tested-by: Jesse Hathaway <je...@mbuki-mvuki.org> > Acked-by: Martin KaFai Lau <ka...@fb.com> Applied and queued up for -stable.