Note that I'm choosing this message out of a mixed thread to give a reply. So, not all of this is targeted as a response to you, Acee.
On Tue, Oct 23, 2018 at 04:30:52PM +0000, Acee Lindem (acee) wrote: > Hi Albert, Les, > > I tend to agree with Les that BFD doesn’t seem like the right protocol for > this. Note that if you use OSPF as your IGP and flap the interface when the > MTU changes, you’ll detect MTU mismatches immediately due to OSPF’s DB > exchange MTU negotiation. Granted, control plane detection won’t detect data > plane bugs resulting in MTU fluctuations but I don’t see this as a frequent > event. Commenting specifically on the OSPF case, when you have such misconfigured MTUs, this manifests as weird protocol hiccups. You don't so much detect that there's an MTU issue - you just see OSPF failing to make progress. Commenting on the general "well, this other protocol has MTU features", one of the meta issues is there's no guarantee that a given protocol may be running over a given link. Several of the scenarios Albert writes about the links may only be running BGP as a control plane protocol. Should BFD be *the* protocol for this? Possibly not. What else instead? A specific MTU probing protocol? What does it look like? My suspicion is that such a thing would have a large overlap with BFD state machinery. Which brings up the question of speed: > From: ginsb...@cisco.com At: 10/23/18 10:45:02 > > Please understand that I fully agree with the importance of being able to > detect/report MTU issues. In my own experience this can be a difficult > problem to diagnose. You do not have to convince me that some improvement in > detection/reporting is needed. The question really is whether using BFD is > the best option. > > Could you respond to my original questions – particularly why sub-second > detection of this issue is a requirement? Sub-second detection of MTU is not specifically the requirement - at least in Albert's use case. I'll let others write about their own requirements. The interesting question arises that if BFD is where we're choosing our MTU probing, what's the intersection between BFD timeliness requirements with its usual shorter packets vs. MTU probing? Note this is particularly important for standard single-hop BFD since you only ever have a *single* session between a given set of endpoints. > It has been stated that there is a need for sub-second detection of this > condition – but I really question that requirement. > > What I would expect is that MTU changes only occur as a result of some > maintenance operation (configuration change, link addition/bringup, > insertion of a new box in the physical path etc.). Unfortunately, that's not always the case. An example (not necessarily Albert's) of where this is problematic is in the case of leased circuits that are partially created using some sort of mechanism such as L2VPNs. While it may appear to the customer that the circuit is a leased line of a given set of properties, it may be delivered through the provider network in a packet-switched environment. This means that MTU issues through the provider network, perhaps the result of a topology change, may result in issues. So, this work - and then stop working. > The idea of using a > mechanism which is specifically tailored for sub-second detection to > monitor something that is only going to change occasionally seems > inappropriate. For Albert's specific use case, I think there's some level of tolerance for detection timing for a down link vs. bad MTU detection. However, I don't think we pretend to speak for all possible users of such a mechanism. And thus, this discussion. :-) > It makes me think that other mechanisms (some form of OAM, > enhancements to routing protocols to do what IS-IS already does •) could > be more appropriate and would still meet the operational requirements. As noted above, I'm not sure layering this onto a full control plane protocol as the primary mechanism is the best idea. (You want MTU detection? You have to build a dummy IS-IS session? Ugh.) Certainly documenting existing mechanisms and when they can deal with the use case is worth noting - perhaps even in the BFD Large draft itself. Please feel free to suggest text. But we have other situations where an IGP-like MTU mechanism won't help. Two simple examples being BGP and static routes. Also, certainly some form of lower layer mechanism could be used,if applicable. Given that one form of known issue is when a LAG is built from component elements where a given element may have smaller MTU than their siblings, it'd be nice if this was solved in something like LACP as well. (And we note that BFD for LAG exists, and is one of the possible applications for BFD-large as well.) -- Jeff