Hi Jeff, Although technically BFD doesn’t seem like the right hammer, I can see the advantages of using it in specific use cases since all the notification machinery is already in place. See one inline.
> On Oct 25, 2018, at 11:29 AM, Jeffrey Haas <jh...@pfrc.org> wrote: > > Note that I'm choosing this message out of a mixed thread to give a reply. > So, not all of this is targeted as a response to you, Acee. > > On Tue, Oct 23, 2018 at 04:30:52PM +0000, Acee Lindem (acee) wrote: >> Hi Albert, Les, >> >> I tend to agree with Les that BFD doesn’t seem like the right protocol for >> this. Note that if you use OSPF as your IGP and flap the interface when the >> MTU changes, you’ll detect MTU mismatches immediately due to OSPF’s DB >> exchange MTU negotiation. Granted, control plane detection won’t detect data >> plane bugs resulting in MTU fluctuations but I don’t see this as a frequent >> event. > > Commenting specifically on the OSPF case, when you have such misconfigured > MTUs, this manifests as weird protocol hiccups. You don't so much detect > that there's an MTU issue - you just see OSPF failing to make progress. However, when implementations start supporting ietf-ospf.yang, there’ll be an if-config-err notification specifically indicating an MTU-mismatch. ;^) Thanks, Acee > > Commenting on the general "well, this other protocol has MTU features", > one of the meta issues is there's no guarantee that a given protocol may be > running over a given link. Several of the scenarios Albert writes about the > links may only be running BGP as a control plane protocol. > > Should BFD be *the* protocol for this? Possibly not. What else instead? A > specific MTU probing protocol? What does it look like? My suspicion is > that such a thing would have a large overlap with BFD state machinery. > > Which brings up the question of speed: > >> From: ginsb...@cisco.com At: 10/23/18 10:45:02 >> >> Please understand that I fully agree with the importance of being able to >> detect/report MTU issues. In my own experience this can be a difficult >> problem to diagnose. You do not have to convince me that some improvement in >> detection/reporting is needed. The question really is whether using BFD is >> the best option. >> >> Could you respond to my original questions – particularly why sub-second >> detection of this issue is a requirement? > > Sub-second detection of MTU is not specifically the requirement - at least > in Albert's use case. I'll let others write about their own requirements. > > The interesting question arises that if BFD is where we're choosing our MTU > probing, what's the intersection between BFD timeliness requirements with > its usual shorter packets vs. MTU probing? > > Note this is particularly important for standard single-hop BFD since you > only ever have a *single* session between a given set of endpoints. > >> It has been stated that there is a need for sub-second detection of this >> condition – but I really question that requirement. >> >> What I would expect is that MTU changes only occur as a result of some >> maintenance operation (configuration change, link addition/bringup, >> insertion of a new box in the physical path etc.). > > Unfortunately, that's not always the case. > > An example (not necessarily Albert's) of where this is problematic is in the > case of leased circuits that are partially created using some sort of > mechanism such as L2VPNs. While it may appear to the customer that the > circuit is a leased line of a given set of properties, it may be delivered > through the provider network in a packet-switched environment. This means > that MTU issues through the provider network, perhaps the result of a > topology change, may result in issues. > > So, this work - and then stop working. > > >> The idea of using a >> mechanism which is specifically tailored for sub-second detection to >> monitor something that is only going to change occasionally seems >> inappropriate. > > For Albert's specific use case, I think there's some level of tolerance for > detection timing for a down link vs. bad MTU detection. However, I don't > think we pretend to speak for all possible users of such a mechanism. And > thus, this discussion. :-) > >> It makes me think that other mechanisms (some form of OAM, >> enhancements to routing protocols to do what IS-IS already does •) could >> be more appropriate and would still meet the operational requirements. > > As noted above, I'm not sure layering this onto a full control plane > protocol as the primary mechanism is the best idea. (You want MTU > detection? You have to build a dummy IS-IS session? Ugh.) > > Certainly documenting existing mechanisms and when they can deal with the > use case is worth noting - perhaps even in the BFD Large draft itself. > Please feel free to suggest text. > > But we have other situations where an IGP-like MTU mechanism won't help. > Two simple examples being BGP and static routes. > > Also, certainly some form of lower layer mechanism could be used,if > applicable. Given that one form of known issue is when a LAG is built from > component elements where a given element may have smaller MTU than their > siblings, it'd be nice if this was solved in something like LACP as well. > (And we note that BFD for LAG exists, and is one of the possible > applications for BFD-large as well.) > > > > -- Jeff