Re: Noisy prefixes in BGP

Romain Fontugne via NANOG Sun, 09 Feb 2025 11:16:17 -0800

Thanks James, great tool, I have bookmarked that.
It has amazing examples of how absurd announcements can be sometimes.

Romain

________________________________________
From: James Bensley <lists+na...@bensley.me>
Sent: Sunday, February 9, 2025 22:43
To: NANOG; [IIJ] Fontugne Romain; Geoff Huston
Subject: Re: Noisy prefixes in BGP

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Hi Romain,

I have been looking at prefixes with large numbers of updates for a few years 
now. As Geoff pointed out, this is a long running problem and it’s not one that 
is going to (ever?) go away.

I have auto-generated daily reports of general pollution seen in the DFZ from 
the previous day, which can be found here: 
https://github.com/DFZ-Name-and-Shame/dnas_stats [1].

Geoff pointed out “when a prefix is being updated 33,000 times in 145 days its 
basically being updated as fast as many BGP implementations will let you”, 
however, there are peers generating millions of updates *per day* for the same 
prefix! There are multiple issues here which need to be unpacked to fully 
understand what’s going on…

* Sometimes BGP updates a prefix as fast as BGP allows (this is what Geoff has 
pointed out). This could be for a range of reasons like a flapping link, or a 
redistribution issue.

* Sometimes there is a software bug in BGP which re-transmits the update as 
fast as TCP allows. Here is an example prefix in a daily report, which was 
present in 4M updates, from a single peer of a single route collector: 
https://github.com/DFZ-Name-and-Shame/dnas_stats/blob/main/2023/03/15/20230315.txt#L607

The ASN a few lines below has almost exactly the same number of BGP 
advertisements for that day:  AS394119 (also that name, 
EXPERIMENTAL-COMPUTING-FACILITY, feels like a bit of a smoking gun!). Using a 
looking glass we can confirm that 394119 is the origin of that prefix: 
https://stat.ripe.net/widget/looking-glass#w.resource=2602:fe10:ff4::/48

Here is a screenshot from RIPE Stat at the same time, where they are recording 
nearly 30M updates for the same prefix per day: 
https://null.53bits.co.uk/uploads/images/networking/internetworking/ripe-prefix-count.png

The difference is that the 30M number is across all RIPE RIS collectors. I have 
tried to de-dupe in my daily report and just choose the highest value from a 
single collector. ~4M updates per day is ~46 updates per second. So this is a 
BGP speaker stuck in an infinite loop sending an update as fast as it can and 
for some reason, and not registering in it’s internal state engine that the 
update has been sent.

I’ve reach out to a few ASNs who’ve shown up in my daily reports for excessive 
announcements, and what I have seen is that sometimes it’s the ASN peering with 
the RIS collector, or sometimes it’s an ASN the route collector peer  is 
peering with. In multiple cases, they have simply bounced their BGP session 
with their peer or with the collector, and the issues has gone away.

* Sometimes a software bug causes the state of a route to falsely change. As an 
example of this, there was a bug in Cisco’s IOS-XR. If you were running soft 
reconfiguration inbound and RPKI, I think any RPKI state changes (to any 
prefix) was causing a route refresh. Or something like that. It’s been a couple 
of years, but you needed this specific combination of features, and IOS-XR was 
just churning out updates like it’s life depended on it. I reached out to an 
ASN who showed up in my daily report, they confirmed they had this bug, and 
eventually they fixed in.

* These problems aren’t DFZ wide. Peer A might be sending a bajillion updates 
to peer B, but peer B sees there is no change in the route and correctly 
doesn’t forward the update onwards to it’s peers / public collectors. So this 
is probably happing a lot more than we see via RIS or RouteViews. Only some 
parts of the DFZ will be receiving the gratuitous updates/withdraws.

I recall there was a conversation either here on NANOG or maybe it was at the 
IETF, within the last few years, about different NOSes that were / were not 
correctly identifying route updates received with no changes to the existing 
RIB entry, and [not] forwarding the update onwards. I’m not sure what came of 
this.

* Some people aren’t monitoring and alerting based on CPU usage. There are 
plenty of old rust buckets connected to the DFZ who’s CPUs will be on fire as a 
result of a buggy peer and the operator is simply unaware. Some people are 
running cutting edge devices with 12 cores at 3Ghz and 64GBs of RAM, for which 
all this churn is no problem, so even if their CPU usage is monitored, it will 
be one core at 100% but the rest nearly idle, and crappy monitoring software 
will aggregate this and report the device has having <10% usage and operators 
think everything is fine.

* There are no knobs in existing BGP implementations to detect and limit this 
behaviour in anyway.

If you end contacting the operators of the ASNs in your original email, and 
getting this problem fixed, I’d be interested to know what the cause was in 
those cases. I’ve all but given up contacting operators that show up in my 
daily reports. It’s an endless endeavour, and some operators simply don’t 
respond to my multiple emails (also, I have a day job, and a personal life, and 
I also like sleeping and eating from time to time).

With kind regards,
James.

[1] It stopped working recently, so it’s now "catching-up", which is why the 
data is a few days behind.
-----BEGIN PGP SIGNATURE-----
Version: ProtonMail

wsG5BAEBCgBtBYJnqLEBCZCoEx+igX+A+0UUAAAAAAAcACBzYWx0QG5vdGF0
aW9ucy5vcGVucGdwanMub3Jn3djP7Jb9k2d+WsA2DtrRWbehxTINFz+sgXZ1
ZYC7j94WIQQ+k2NZBObfK8Tl7sKoEx+igX+A+wAAn4UP/j6fQ/Mlkb45Z0pB
tKqHmN7VZ73uHkbDQoJrXhTSM3SNeUIUQsA5u9y55ep17NpAXA30+/ledThF
OObJqaC4cpojoClh/3EHogZXWrJOH3G8xEDRl36rVVxYVzfdOsUpQ4Efpf0G
mVBTzQAxFGU6RNu0+Ri/v5leuJVwZvKz3Gzt/dELm2bRpF6sCD3bmHob1Am+
lc0vPNHlvPy/Ap3RvjJo5It62ulwcMQgqUDoZWD9S8Df9pRt7x3IO8bI1BOa
PWtbSNfjaQDNEs7Q1mVwY1jg5sIgO2wDqdn/o5787bsg+jmIiOBMGFrqY658
H1Ppfk/o1hKVpIV+sWUy6qX8iJ5rs5XrMPQXH1jVFqfCJqMz2fFGYzLPht+3
oKRFmPRPn3aXFFaveHOgy3vVE+ad4kwzDS7CBIM0WxKBKs3cl5QKM7kcmMNZ
KU0EriHm9po25tJwwbzW6nyZ/tmHTWN8+aSbPgdeZnl0PsDe7BdwPnJe/YBa
Ps6CSaCrd0a1k1oQliXFABCOZ0vjjmXP1yfSUj05/I/M5VQmTVcNYGRvCVpK
d6Xljh1nT9PpKYzl/ruz2C6Bi0iET2qGIc47A4yEn7optanx/Cnw9mWt2FkE
l9/gazByKhRd2i/2U/Fn0itfaLANLHl3pH45Ca/xe/KTYU9KPLi9EVmI0V72
23Gk2irf
=Fau/
-----END PGP SIGNATURE-----

Re: Noisy prefixes in BGP

Reply via email to