On Wed, May 15, 2019 at 2:51 PM Martin Lau <ka...@fb.com> wrote: > > On Tue, May 14, 2019 at 05:46:10PM -0700, Wei Wang wrote: > > From: Wei Wang <wei...@google.com> > > > > When inserting route cache into the exception table, the key is > > generated with both src_addr and dest_addr with src addr routing. > > However, current logic always assumes the src_addr used to generate the > > key is a /128 host address. This is not true in the following scenarios: > > 1. When the route is a gateway route or does not have next hop. > > (rt6_is_gw_or_nonexthop() == false) > > 2. When calling ip6_rt_cache_alloc(), saddr is passed in as NULL. > > This means, when looking for a route cache in the exception table, we > > have to do the lookup twice: first time with the passed in /128 host > > address, second time with the src_addr stored in fib6_info. > > > > This solves the pmtu discovery issue reported by Mikael Magnusson where > > a route cache with a lower mtu info is created for a gateway route with > > src addr. However, the lookup code is not able to find this route cache. > > > > Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache") > > Reported-by: Mikael Magnusson <mikael.ker...@lists.m7n.se> > > Bisected-by: David Ahern <dsah...@gmail.com> > > Signed-off-by: Wei Wang <wei...@google.com> > > Acked-by: Eric Dumazet <eduma...@google.com> > > --- > > net/ipv6/route.c | 33 ++++++++++++++++++++++++++++----- > > 1 file changed, 28 insertions(+), 5 deletions(-) > > > > diff --git a/net/ipv6/route.c b/net/ipv6/route.c > > index 23a20d62daac..c36900a07a78 100644 > > --- a/net/ipv6/route.c > > +++ b/net/ipv6/route.c > > @@ -1574,23 +1574,36 @@ static struct rt6_info *rt6_find_cached_rt(const > > struct fib6_result *res, > > struct rt6_exception *rt6_ex; > > struct rt6_info *ret = NULL; > > > > - bucket = rcu_dereference(res->f6i->rt6i_exception_bucket); > > - > > #ifdef CONFIG_IPV6_SUBTREES > > /* fib6i_src.plen != 0 indicates f6i is in subtree > > * and exception table is indexed by a hash of > > * both fib6_dst and fib6_src. > > - * Otherwise, the exception table is indexed by > > - * a hash of only fib6_dst. > > + * However, the src addr used to create the hash > > + * might not be exactly the passed in saddr which > > + * is a /128 addr from the flow. > > + * So we need to use f6i->fib6_src to redo lookup > > + * if the passed in saddr does not find anything. > > + * (See the logic in ip6_rt_cache_alloc() on how > > + * rt->rt6i_src is updated.) > > */ > > if (res->f6i->fib6_src.plen) > > src_key = saddr; > > +find_ex: > > #endif > > + bucket = rcu_dereference(res->f6i->rt6i_exception_bucket); > > rt6_ex = __rt6_find_exception_rcu(&bucket, daddr, src_key); > > > > if (rt6_ex && !rt6_check_expired(rt6_ex->rt6i)) > > ret = rt6_ex->rt6i; > > > > +#ifdef CONFIG_IPV6_SUBTREES > > + /* Use fib6_src as src_key and redo lookup */ > > + if (!ret && src_key == saddr) { > > + src_key = &res->f6i->fib6_src.addr; > > + goto find_ex; > > + } > > +#endif > > + > > return ret; > > } > > > > @@ -2683,12 +2696,22 @@ u32 ip6_mtu_from_fib6(const struct fib6_result *res, > > #ifdef CONFIG_IPV6_SUBTREES > > if (f6i->fib6_src.plen) > > src_key = saddr; > > +find_ex: > > #endif > > - > > bucket = rcu_dereference(f6i->rt6i_exception_bucket); > > rt6_ex = __rt6_find_exception_rcu(&bucket, daddr, src_key); > > if (rt6_ex && !rt6_check_expired(rt6_ex->rt6i)) > > mtu = dst_metric_raw(&rt6_ex->rt6i->dst, RTAX_MTU); > > +#ifdef CONFIG_IPV6_SUBTREES > > + /* Similar logic as in rt6_find_cached_rt(). > > + * We need to use f6i->fib6_src to redo lookup in exception > > + * table if saddr did not yield any result. > > + */ > > + else if (src_key == saddr) { > > + src_key = &f6i->fib6_src.addr; > > + goto find_ex; > > + } > > +#endif > Nit. > Instead of repeating this retry logic, > can it be consolidated into __rt6_find_exception_xxx() > by passing fib6_src.addr as a secondary matching > saddr? > Thanks Martin. Changing __rt6_find_exception_xxx() might not be easy cause other callers of this function does not really need to back off and use another saddr. And the validation of the result is a bit different for different callers. What about add a new helper for the above 2 cases and just call that from both places?
> > > > if (likely(!mtu)) { > > struct net_device *dev = nh->fib_nh_dev; > > -- > > 2.21.0.1020.gf2820cf01a-goog > >