Hi Toke, On Fri, Apr 22, 2022 at 01:48:46AM +0200, Toke Høiland-Jørgensen wrote: > I've implemented the Babel RTT extension specified in > draft-ietf-babel-rtt-extension in Bird. I've tested that it talks to > babeld on a single link and that the two implementations agree on each > others' (smoothed) RTT values. However, I'd like to subject the code to > some more tortured testing before submitting it to upstream Bird. So I'm > sending this note as a request for testing.
Nice work! I replaced the bird binary and changed the interface type to "tunnel" on a mesh of four hosts. Works great so far! Things I noticed: (1) When I forgot to change the config file (one side was type tunnel, on side was type wired), the Babel neighbor metric was stuck on 65535. I think this happens because the expected time stamp was not received and then the metric computation does not work. While I understand that such a "broken" setup is not really supported, it was not exactly clear where to locate the problem. (2) Also I think it would be neat if "birdc show babel neigh" would show latency info (current latency + smoothed value). (3) Due to route flapping I tried to increase "metric decay" to 60s. After running "birdc configure" the values became very large for one link (on one side only). > bird: babel1: RTT sample for neighbour fe80::3 on wg2: 4294966323 us (srtt > 99189.162 ms) > bird: babel1: Added RTT cost 96 to nbr fe80::3 on wg2 with srtt 99189.162 ms Nothing changed after >1h. The opposite side was reporting sensible RTT numbers. After I restarted the daemon, the smoothed value was still off for this one link: > bird: babel1: RTT sample for neighbour fe80::3 on wg2: 1241 us (srtt > 69656.646 ms) > bird: babel1: Added RTT cost 96 to nbr fe80::3 on wg2 with srtt 69656.646 ms The srtt value did not converge after >1h. For all other links the smoothing works, e.g. for wg1 on the very same host: > bird: babel1: RTT sample for neighbour fe80::1 on wg0: 14570 us (srtt 15.876 > ms) > bird: babel1: Added RTT cost 5 to nbr fe80::1 on wg0 with srtt 15.876 ms After restarting bird once more (without changing anything) it works since then: > bird: babel1: RTT sample for neighbour fe80::3 on wg2: 1261 us (srtt 1.313 ms) In this setup wg2 was a tunnel over the local LAN, so latency was often < 1000 us. Maybe there is a problem for tiny latencies and/or larger values of "metric decay"? I did not find a way to reliably reproduce the problem. Best regards, Stefan Haller