Re: Race condition in route lookup

2019-10-16 Thread Wei Wang
On Tue, Oct 15, 2019 at 11:39 PM Martin Lau wrote: > > On Tue, Oct 15, 2019 at 09:44:11AM -0700, Wei Wang wrote: > > On Tue, Oct 15, 2019 at 7:29 AM Jesse Hathaway > > wrote: > > > > > > On Fri, Oct 11, 2019 at 12:54 PM Wei Wang wrote: > > > > Hmm... Yes... I would think a per-CPU input cache s

Re: Race condition in route lookup

2019-10-15 Thread Martin Lau
On Tue, Oct 15, 2019 at 09:44:11AM -0700, Wei Wang wrote: > On Tue, Oct 15, 2019 at 7:29 AM Jesse Hathaway wrote: > > > > On Fri, Oct 11, 2019 at 12:54 PM Wei Wang wrote: > > > Hmm... Yes... I would think a per-CPU input cache should work for the > > > case above. > > > Another idea is: instead o

Re: Race condition in route lookup

2019-10-15 Thread Martin Lau
On Tue, Oct 15, 2019 at 09:42:49AM -0700, Wei Wang wrote: > On Tue, Oct 15, 2019 at 7:45 AM David Ahern wrote: > > > > On 10/14/19 1:26 PM, Martin Lau wrote: > > > > > > AFAICT, even for the route that are affected by > > > fib6_update_sernum_upto_root(), > > > I don't see the RTF_PCPU route is r

Re: Race condition in route lookup

2019-10-15 Thread Wei Wang
On Tue, Oct 15, 2019 at 7:29 AM Jesse Hathaway wrote: > > On Fri, Oct 11, 2019 at 12:54 PM Wei Wang wrote: > > Hmm... Yes... I would think a per-CPU input cache should work for the > > case above. > > Another idea is: instead of calling dst_dev_put() in rt_cache_route() > > to switch out the dev,

Re: Race condition in route lookup

2019-10-15 Thread Wei Wang
On Tue, Oct 15, 2019 at 7:45 AM David Ahern wrote: > > On 10/14/19 1:26 PM, Martin Lau wrote: > > > > AFAICT, even for the route that are affected by > > fib6_update_sernum_upto_root(), > > I don't see the RTF_PCPU route is re-created. v6 sk does > > dst_check() => re-lookup the fib6 => > > foun

Re: Race condition in route lookup

2019-10-15 Thread David Ahern
On 10/14/19 1:26 PM, Martin Lau wrote: > > AFAICT, even for the route that are affected by > fib6_update_sernum_upto_root(), > I don't see the RTF_PCPU route is re-created. v6 sk does > dst_check() => re-lookup the fib6 => > found the same RTF_PCPU (but does not re-create it) => > update the sk

Re: Race condition in route lookup

2019-10-15 Thread Jesse Hathaway
On Fri, Oct 11, 2019 at 12:54 PM Wei Wang wrote: > Hmm... Yes... I would think a per-CPU input cache should work for the > case above. > Another idea is: instead of calling dst_dev_put() in rt_cache_route() > to switch out the dev, we call, rt_add_uncached_list() to add this > obsolete dst cache t

Re: Race condition in route lookup

2019-10-14 Thread Martin Lau
On Sun, Oct 13, 2019 at 05:23:01PM -0700, Wei Wang wrote: > On Fri, Oct 11, 2019 at 11:56 PM Martin Lau wrote: > > > > On Fri, Oct 11, 2019 at 10:54:13AM -0700, Wei Wang wrote: > > > On Fri, Oct 11, 2019 at 8:42 AM Ido Schimmel wrote: > > > > > > > > On Fri, Oct 11, 2019 at 09:36:51AM -0500, Jess

Re: Race condition in route lookup

2019-10-13 Thread Wei Wang
On Fri, Oct 11, 2019 at 11:56 PM Martin Lau wrote: > > On Fri, Oct 11, 2019 at 10:54:13AM -0700, Wei Wang wrote: > > On Fri, Oct 11, 2019 at 8:42 AM Ido Schimmel wrote: > > > > > > On Fri, Oct 11, 2019 at 09:36:51AM -0500, Jesse Hathaway wrote: > > > > On Thu, Oct 10, 2019 at 3:31 AM Ido Schimmel

Re: Race condition in route lookup

2019-10-11 Thread Martin Lau
On Fri, Oct 11, 2019 at 10:54:13AM -0700, Wei Wang wrote: > On Fri, Oct 11, 2019 at 8:42 AM Ido Schimmel wrote: > > > > On Fri, Oct 11, 2019 at 09:36:51AM -0500, Jesse Hathaway wrote: > > > On Thu, Oct 10, 2019 at 3:31 AM Ido Schimmel wrote: > > > > I think it's working as expected. Here is my th

Re: Race condition in route lookup

2019-10-11 Thread David Ahern
On 10/11/19 12:52 PM, Ido Schimmel wrote: > On Fri, Oct 11, 2019 at 11:47:12AM -0700, Wei Wang wrote: >> On Fri, Oct 11, 2019 at 11:25 AM Ido Schimmel wrote: >>> >>> On Fri, Oct 11, 2019 at 09:17:42PM +0300, Ido Schimmel wrote: On Fri, Oct 11, 2019 at 10:54:13AM -0700, Wei Wang wrote: > O

Re: Race condition in route lookup

2019-10-11 Thread Jesse Hathaway
On Fri, Oct 11, 2019 at 1:52 PM Ido Schimmel wrote: > I think this is fine. > > Jesse, can you please test Wei's patch? I tested with a patched kernel using the same scripts and it does seem to resolve the issue, thanks!

Re: Race condition in route lookup

2019-10-11 Thread Ido Schimmel
On Fri, Oct 11, 2019 at 11:47:12AM -0700, Wei Wang wrote: > On Fri, Oct 11, 2019 at 11:25 AM Ido Schimmel wrote: > > > > On Fri, Oct 11, 2019 at 09:17:42PM +0300, Ido Schimmel wrote: > > > On Fri, Oct 11, 2019 at 10:54:13AM -0700, Wei Wang wrote: > > > > On Fri, Oct 11, 2019 at 8:42 AM Ido Schimme

Re: Race condition in route lookup

2019-10-11 Thread Wei Wang
On Fri, Oct 11, 2019 at 11:25 AM Ido Schimmel wrote: > > On Fri, Oct 11, 2019 at 09:17:42PM +0300, Ido Schimmel wrote: > > On Fri, Oct 11, 2019 at 10:54:13AM -0700, Wei Wang wrote: > > > On Fri, Oct 11, 2019 at 8:42 AM Ido Schimmel wrote: > > > > > > > > On Fri, Oct 11, 2019 at 09:36:51AM -0500,

Re: Race condition in route lookup

2019-10-11 Thread Ido Schimmel
On Fri, Oct 11, 2019 at 09:17:42PM +0300, Ido Schimmel wrote: > On Fri, Oct 11, 2019 at 10:54:13AM -0700, Wei Wang wrote: > > On Fri, Oct 11, 2019 at 8:42 AM Ido Schimmel wrote: > > > > > > On Fri, Oct 11, 2019 at 09:36:51AM -0500, Jesse Hathaway wrote: > > > > On Thu, Oct 10, 2019 at 3:31 AM Ido

Re: Race condition in route lookup

2019-10-11 Thread Ido Schimmel
On Fri, Oct 11, 2019 at 10:54:13AM -0700, Wei Wang wrote: > On Fri, Oct 11, 2019 at 8:42 AM Ido Schimmel wrote: > > > > On Fri, Oct 11, 2019 at 09:36:51AM -0500, Jesse Hathaway wrote: > > > On Thu, Oct 10, 2019 at 3:31 AM Ido Schimmel wrote: > > > > I think it's working as expected. Here is my th

Re: Race condition in route lookup

2019-10-11 Thread Wei Wang
On Fri, Oct 11, 2019 at 8:42 AM Ido Schimmel wrote: > > On Fri, Oct 11, 2019 at 09:36:51AM -0500, Jesse Hathaway wrote: > > On Thu, Oct 10, 2019 at 3:31 AM Ido Schimmel wrote: > > > I think it's working as expected. Here is my theory: > > > > > > If CPU0 is executing both the route get request an

Re: Race condition in route lookup

2019-10-11 Thread Jesse Hathaway
On Fri, Oct 11, 2019 at 10:42 AM Ido Schimmel wrote: > Do you remember when you started seeing this behavior? I think it > started in 4.13 with commit ffe95ecf3a2e ("Merge branch > 'net-remove-dst-garbage-collector-logic'"). Unfortunately, our data on when the problem started is a bit fuzzy, but

Re: Race condition in route lookup

2019-10-11 Thread Ido Schimmel
On Fri, Oct 11, 2019 at 09:36:51AM -0500, Jesse Hathaway wrote: > On Thu, Oct 10, 2019 at 3:31 AM Ido Schimmel wrote: > > I think it's working as expected. Here is my theory: > > > > If CPU0 is executing both the route get request and forwarding packets > > through the directly connected interface

Re: Race condition in route lookup

2019-10-11 Thread Jesse Hathaway
On Thu, Oct 10, 2019 at 3:31 AM Ido Schimmel wrote: > I think it's working as expected. Here is my theory: > > If CPU0 is executing both the route get request and forwarding packets > through the directly connected interface, then the following can happen: > > - In process context, per-CPU dst en

Re: Race condition in route lookup

2019-10-10 Thread Ido Schimmel
On Thu, Oct 10, 2019 at 11:31:04AM +0300, Ido Schimmel wrote: > On Wed, Oct 09, 2019 at 11:00:07AM -0500, Jesse Hathaway wrote: > > We have been experiencing a route lookup race condition on our internet > > facing > > Linux routers. I have been able to reproduce the issue, but would love more > >

Re: Race condition in route lookup

2019-10-10 Thread Ido Schimmel
On Wed, Oct 09, 2019 at 11:00:07AM -0500, Jesse Hathaway wrote: > We have been experiencing a route lookup race condition on our internet facing > Linux routers. I have been able to reproduce the issue, but would love more > help in isolating the cause. > > Looking up a route found in the main tab

Race condition in route lookup

2019-10-09 Thread Jesse Hathaway
We have been experiencing a route lookup race condition on our internet facing Linux routers. I have been able to reproduce the issue, but would love more help in isolating the cause. Looking up a route found in the main table returns `*` rather than the directly connected interface about once for