On Wed, 1 May 2019, David Ahern wrote:

> On 5/1/19 10:03 AM, Alan Maguire wrote:
> > I'm seeing the following repeated error
> > 
> > [  130.821362] neighbour: arp_cache: neighbor table overflow!
> > 
> > when using MPLSoverGRE or MPLSoverUDP tunnels on bits synced
> > with bpf-next as of this morning. The test script below reliably
> > reproduces the problem, while working fine on a 4.14 (I haven't
> > bisected yet). It can be run with no arguments, or specifying
> > gre or udp for the specific encap type.
> > 
> > It seems that every MPLS-encapsulated outbound packet is attempting
> > to add  a neighbor entry, and as a result we hit the 
> > net.ipv4.neigh.default.gc_thresh3 limit quickly.
> > 
> > When this failure occurs, the arp table doesn't show any of
> > these additional entries. Existing arp table entries are
> > disappearing too, so perhaps they are being recycled when the
> > table becomes full?
> > 
> 
> There are 2 bugs:
> 1. neigh_xmit fails to find a neighbor entry on every single Tx. This
> was introduced by:
> 
> cd9ff4de010 ("ipv4: Make neigh lookup keys for loopback/point-to-point
> devices be INADDR_ANY")
> 
> Basically, the primary_key is reset to 0 for tun's but the neigh_xmit
> lookup was not corrected.
> 
> That caused a new neigh entry to be created on each packet Tx, but
> before inserting the new one to the table the create function looks to
> see if an entry already exists. The arp constructor had reset the key to
> 0 in the new neighbor entry so the second lookup finds a match and the
> new one is dropped.
> 
> That exposed a second bug.
> 
> 2. neigh_alloc bumps the gc_entries counter when a new one is allocated,
> but ___neigh_create is not dropping the counter in the error path.
> 
> Ian reported a similar problem, but we were not able to isolate the cause.
>

Fantastic, thanks so much for the quick fixes! I verified them at my
end, ensuring that with the patches applied to the latest net tree,
the previously-failing test succeeds.
 
> Thanks for the script - very helpful in resolving the bugs. I made some
> changes to it and I plan to submit it to selftests as a starter for mpls
> tests.
> 

Sounds great! It's mostly cobbled together from Willem's
bpf test_tc_tunnel.sh script, so like that could probably be
generalized to cover more tunnel types too.

Thanks again!

Alan

> Bug fix patches coming.
> 

Reply via email to