On Fri, Jun 07, 2019 at 12:58:52AM +0200, Stefano Brivio wrote: > On Thu, 6 Jun 2019 22:37:11 +0000 > Martin Lau <ka...@fb.com> wrote: > > > On Fri, Jun 07, 2019 at 12:17:47AM +0200, Stefano Brivio wrote: > > > On Thu, 6 Jun 2019 21:44:58 +0000 > > > Martin Lau <ka...@fb.com> wrote: > > > > > > > > + if (!(filter->flags & RTM_F_CLONED)) { > > > > > + err = rt6_fill_node(net, arg->skb, rt, NULL, NULL, > > > > > NULL, 0, > > > > > + RTM_NEWROUTE, > > > > > + NETLINK_CB(arg->cb->skb).portid, > > > > > + arg->cb->nlh->nlmsg_seq, flags); > > > > > + if (err) > > > > > + return err; > > > > > + } else { > > > > > + flags |= NLM_F_DUMP_FILTERED; > > > > > + } > > > > > + > > > > > + bucket = rcu_dereference(rt->rt6i_exception_bucket); > > > > > + if (!bucket) > > > > > + return 0; > > > > > + > > > > > + for (i = 0; i < FIB6_EXCEPTION_BUCKET_SIZE; i++) { > > > > > + hlist_for_each_entry(rt6_ex, &bucket->chain, hlist) { > > > > > + if (rt6_check_expired(rt6_ex->rt6i)) > > > > > + continue; > > > > > + > > > > > + err = rt6_fill_node(net, arg->skb, rt, > > > > > + &rt6_ex->rt6i->dst, > > > > > + NULL, NULL, 0, RTM_NEWROUTE, > > > > > + > > > > > NETLINK_CB(arg->cb->skb).portid, > > > > > + arg->cb->nlh->nlmsg_seq, > > > > > flags); > > > > Thanks for the patch. > > > > > > > > A question on when rt6_fill_node() returns -EMSGSIZE while dumping the > > > > exception bucket here. Where will the next inet6_dump_fib() start? > > > > > > And thanks for reviewing. > > > > > > It starts again from the same node, see fib6_dump_node(): w->leaf = rt; > > > where rt is the fib6_info where we failed dumping, so we won't skip > > > dumping any node. > > If the same node will be dumped, does it mean that it will go through this > > loop and iterate all exceptions again? > > Yes (well, all the exceptions for that node). > > > > This also means that to avoid sending duplicates in the case where at > > > least one rt6_fill_node() call goes through and one fails, we would > > > need to track the last bucket and entry sent, or, alternatively, to > > > make sure we can fit the whole node before dumping. > > My another concern is the dump may never finish. > > That's not a guarantee in general, even without this, because in theory > the skb passed might be small enough that we can't even fit a single > node without exceptions. That is arguably the caller's responsibility to retry with a larger buffer if it cannot even get a single route.
If caller provides a large enough buffer for a single route, the kernel should guarantee forward progress. I think the minimum is to remember how many exceptions have to be skipped. > > We could add a guard on w->leaf not being the same before and after the > walk in inet6_dump_fib() and, if it is, terminate the dump. I just > wonder if we have to do this at all -- I can't find this being done > anywhere else (at a quick look at least). > > By the way, we can also trigger a never-ending dump by touching the > tree frequently enough during a dump: it would always start again from > the root, see fib6_dump_table(). This case "cb->args[5] != w->root->fn_sernum"? It seems there is a w->skip to take care of it. Regardless, I don't think we should make it worse.