Re: [PATCH] IPv6: Fix CPU contention on FIB6 GC

Michal Kubecek Tue, 23 Jun 2020 14:57:02 -0700

On Tue, Jun 23, 2020 at 12:46:34AM +0200, Oliver Herms wrote:
> On 22.06.20 23:44, Michal Kubecek wrote:
> > On Mon, Jun 22, 2020 at 10:53:55PM +0200, Oliver Herms wrote:
> >> When fib6_run_gc is called with parameter force=true the spinlock in
> >> /net/ipv6/ip6_fib.c:2310 can lock all CPUs in softirq when
> >> net.ipv6.route.max_size is exceeded (seen this multiple times).
> >> One sotirq/CPU get's the lock. All others spin to get it. It takes
> >> substantial time until all are done. Effectively it's a DOS vector.
> >>
> >> As the splinlock is only enforcing that there is at most one GC running
> >> at a time, it should IMHO be safe to use force=false here resulting
> >> in spin_trylock_bh instead of spin_lock_bh, thus avoiding the lock
> >> contention.
> >>
> >> Finding a locked spinlock means some GC is going on already so it is
> >> save to just skip another execution of the GC.
> >>
> >> Signed-off-by: Oliver Herms <oliver.peter.he...@gmail.com>
> > 
> > I wonder if it wouldn't suffice to revert commit 14956643550f ("ipv6:
> > slight optimization in ip6_dst_gc") as the reasoning in its commit
> > message seems wrong: we do not always skip fib6_run_gc() when
> > entries <= rt_max_size, we do so only if the time since last garbage
> > collector run is shorter than rt_min_interval.
> > 
> > Then you would prevent the "thundering herd" effect when only gc_thresh
> > is exceeded but not max_size, as commit 2ac3ac8f86f2 ("ipv6: prevent
> > fib6_run_gc() contention") intended, but would still preserve enforced
> > garbage collect when max_size is exceeded.
> > 
> > Michal
> > 
> 
> Hi Michal,
> 
> I did some testing with packets causing 17k IPv6 route cache entries per 
> second. With "entries > rt_max_size" all CPUs of the system get stuck 
> waiting for the spinlock. With "false" CPU load stays at <<10% on every single
> CPU core (tested on an Atom C2750). This makes sense as "entries > 
> rt_max_size"
> would not prevent multiple CPUs from trying to get the lock.
> 
> So reverting 14956643550f is not enough.


The problem I see with this kind of test is that you simulate a scenario
where you are routinely using more entries than the cache is sized for.
In other words, if this happened in a real life setup, the actual
problem would be too low settings for gc_thresh and max_size. Also, your
patch would minimize the difference between gc_thresh (where we want to
get) and max_size (hard limit) by making the hard limit "softer".

Michal

Re: [PATCH] IPv6: Fix CPU contention on FIB6 GC

Reply via email to