On 07.03.2013 16:34, Alexander V. Chernikov wrote:
On 07.03.2013 17:51, Andre Oppermann wrote:
On 07.03.2013 14:38, Ermal Luçi wrote:
Isn't it better to teach the routing code about metrics.
Routing daemons cope better this way and they can handle this.
So the policy of this behaviour can be controled by administrator
rather than by code!
With metrics you can add routes with bigger metric for interfaces and
lower from routing daemons.
This also can mitigate somehow on interfaces with the same subnet
configured possibly.
Generally I agree with you that this would be the ideal outcome.
However we're still quite a bit away from reaching that goal.
To make this really work we have make mpath plus metrics a first
class citizen in the routing code and also the update the routing
daemons kernel interfaces to know about this. I hope we get there
in the not too distant future.
>
Radix is already over-bloated. Typically in performance-oriented
solutions (hardware/software routers from vendors) there is clear
separation between RIB (where route protocol attributes, best candidate
routes, routes with different priority exists) and FIB, which is
typically some kind of radix with minimum needed info, e.g:
prefix, nexthops, their interfaces, optional L2 data to prepend.
ACK. Though the bloat in itself is not main problem other than kernel
memory consumption. If you think of it in cache line misses everything
more than 128 bytes away is potentially a cache miss. The additional
distance due to a large or small structure makes no difference. What
makes an important difference is the internal layout of the structure
and whether the relevant variables are within the same cache line.
This can be a problem in a large structure when some data is at the
beginning and other data at the end on a different cache line. Here
potentially twice the cache miss latency per trie element hurts.
If we can manage to put everything for a trie search into the first
cache line we're quit good already. The additional win for tighter
packing isn't that large anymore.
Our radix stands somewhere between RIB and FIB (since we have to support
route(8) and upper layer protocols): it serves badly as RIB (little
functionality) and as FIB: too much overhead and inefficient/too general
code.
ACK. There is a big philosophical question on the model. Make it a
RIB so that independent but complementary routing daemons can add
routes concurrently and the kernel knows which have higher priority
or are equal cost for traffic balancing (as in bgpd+ospfd). Or strip
it to a FIB and have a external program do the RIB and coordination
across routing daemons (as in Quagga suite).
For example, sizeof(rt_nodes[2]) (first element of rte) is 96 bytes on
amd64.
That is a problem if the trie traversal function accesses fields beyond
the this cache line. The main problem is that key and mask are pointers
and thus external to the radix_node adding even more cache misses.
Additionally, rte refcount approach is totally broken.
ACK. Copy and out. No references or external pointers into the table.
I'm currently thinking of adding some kind of hooks to current
route/radix code to permit building efficient trie (or other structure)
for given address family and to use it for forwarding purposes only.
AFAIK Marco Zec and/or Luigi have done some work in this area as well.
For example, I don't need trie while doing MPLS label switching:
assuming control plane allocates contiguous label space, I can use label
array for efficient lookup.
Nobody's forcing you to use a radix trie for MPLS. In theory each
protocol can chose its own best method.
--
Andre
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"