Re: BFD vs network brownouts

Alex Buie Thu, 09 Jan 2025 13:47:02 -0800

>
> it's there to detect *reachability* failure faster than protocols
> themselves would do so



Exactly this - we have some type 2 fiber transit circuits which are
presumably connected to some sort of re-encoder or something, as we have
had a few scenarios where the router at the far-remote end died but we
maintained light and Ethernet. BFD helps us greatly here when we don't lose
light or link.

I haven't yet experienced it in a brownout condition, though, say where I'm
facing 40% packet loss broadly to all ISPs; generally everything of that
nature of ours has been way further in the core and to specific routes or
peers (bad LAG member on the peering interconnect or backbone between our
upstream and whoever on the DFZ. AS7018 seems notorious for leaving peering
links in a shitty state)


On Thu, Jan 9, 2025 at 2:58 PM Tom Beecher <[email protected]> wrote:

> i, all.  BFD is well known for what it brings to the table for improving
>> link failure detection; however, even at a reasonably athletic 300ms
>> Control rate, you're not going to catch a significant percentage of
>> brownout situations where you have packet loss but not a full outage.  I'm
>> trying to:
>
>
> BFD doesn't improve link failure detection. It's the exact opposite ; it's
> there to detect *reachability* failure faster than protocols themselves
> would do so , in those cases where link failure does NOT occur, which would
> otherwise do the same thing.
>
> Beyond that, I agree with Jason and Saku that BFD is not the correct tool
> for what you're trying to achieve anyways. Aside from monitoring interface
> counters, there are software options out there to detect loss % on explicit
> paths that would suit your need much better
>
>
>
> On Thu, Jan 9, 2025 at 2:58 AM Saku Ytti <[email protected]> wrote:
>
>> On Thu, 9 Jan 2025 at 00:23, David Zimmerman via NANOG <[email protected]>
>> wrote:
>>
>> > find any formal or semi-formal writing about quantification of BFD's
>> effectiveness.  For example, my mental picture is a 3D graph where, for a
>> given Control rate and corresponding Detection Time, the X axis is
>> percentage of packet loss, the Y axis is the Control/Detection timer tuple,
>> and the Z axis is the likelihood that BFD will fully engage (i.e., missing
>> all three Control packets).  Beyond what I believe is a visualization
>> complexity needing some single malt scotch nearby, letting even a single
>> Control packet through resets your Detection timer.
>> > ask if folks in the Real World use BFD towards this end, or have other
>> mechanisms as a data plane loss instrumentation vehicle.  For example, in
>> my wanderings, I've found an environment that offloads the diagnostic load
>> to adjacent compute nodes, but they reach out to orchestration to trigger
>> further router actions in a full-circle cycle measured in minutes.  Short
>> of that, really aggressive timers (solving through brute force) on BFD
>> quickly hit platform limits for scale unless perhaps you can offboard the
>> BFD to something inline (e.g. the Ciena 5170 can be dialed down to a 3.3ms
>> Control timer).
>> >
>> >
>> >
>> > Any thoughts appreciated.  I'm also pursuing ways of having my internal
>> "customer" signal me upon their own packet loss observation (e.g. 1% loss
>> for most folks is a TCP retransmission, but 1% loss for them are crying
>> eyeballs and an escalation).
>>
>> I agree with what Jason wrote, that this is not what BFD was designed for.
>>
>> In SONET/SDH even WAN-PHY you could declare interface down if BER
>> threshold went beyond what you consider acceptable. For more modern
>> interfaces your best bet is RS-FEC and preFEC error rate as predictor,
>> possibly multimetric decision including also DDM data and projections.
>> To my knowledge vendors currently don't have software support to
>> assert RFI on preFEC counters, infact last time I looked you couldn't
>> even SNMP GET FEC counters, for which I opened Enhancement Requests to
>> vendors. So today you'd need to do this with screenscraping and manual
>> interface down, which is a much bigger hammer than RFI assertion.
>>
>> --
>>   ++ytti
>>
>

Re: BFD vs network brownouts

Reply via email to