Moin, quick follow-up: - I forgot to mention: pf should be off (or block return must be commented, at least) given the async routing going on for this to hit. - I just tested this with an added in linux, and it indeed leads to a packet storm, even though it is more fiddly to create the state where this happens. So it is certainly two independent problems. I will also poke in the linux direction so they can fix their ICMP6 rate limiting issue.
I am somewhat getting the idea that this is some really transient thing due to the distribution of nexthops in bgp and routes sticking/staling somehow. With best regards, Tobias On Thu, 2024-03-07 at 23:20 +0100, Tobias Fiebig wrote: > Moin, > > ok, had a hunch, and i think i got closer to this. I can now semi- > reproduce this in a lab environment. with six OpenBSD 7.4. I guess > the > last missing component is bringing in a Linux router, i.e., in a pure > openbsd setup it is not that bad because openbsd does not send type 2 > ad infinum (unlike Linux). Still, packets seem to remain looping for > some time even after the connections are gone as well; Only far less > packets. > > The issue occurs if I have async routing with one path going via a > lower MTU link, while bgpd has two routes with equal path length > tagged > as multipath on the client node; For some reason i seem to not > stumble > into this when this is not the case (or i am holding sth. wrong > there). > > The setup has six hosts: > > rtr-1a1.tst.as59645.net > .. > rtr-1a6.tst.as59645.net > > The configs / routingtable state can be found here: > https://rincewind.home.aperture-labs.org/~tfiebig/mtucfg/ > > (vio0 on all hosts is used for mgmt). > > If, in that setup, we execute this on rtr-1a1: > > rtr-1a1.tst.as59645.net ~ # dd if=/dev/random of=/tmp/testfile bs=1M > count=32 > rtr-1a1.tst.as59645.net ~ # cat /tmp/testfile | nc -6 -l 2342 > > and then on rtr-1a6: > > rtr-1a6.tst.as59645.net ~ # nc -6 -s 2a06:d1c4::1a6 2a06:d1c4::1a1 > 2342 > > pv > /dev/null > > The connection immediately stalls, and on rtr-1a3 we can see the > Type2 > loop going on vio1. > > (Sample pcap here: > https://rincewind.home.aperture-labs.org/~tfiebig/mtucfg/configs_rtr-1a3.tst.as59645.net/ignored_mtu.pcap > ) > > I assume that we would see full link congestion if we replaced rtr- > 1a3 > with a linux box that is less conservative about resending ICMP6 > messages than the openbsd box used in this lab case. > > So, essentially, these are two issues: > - Linux (at least 6.1.65-amd64) seems to just shell out ICMP6 Type2 > repeatedly without ratelimiting (which MUST NOT be done per RFC4443) > - OpenBSD seems to have some cornercases where ICMP6 Type 2 are > ignored. > > With best regards, > Tobias