Hi Jeff, 

Although technically BFD doesn’t seem like the right hammer, I can see the 
advantages of using it in specific use cases since all the notification 
machinery is already in place. See one inline. 

> On Oct 25, 2018, at 11:29 AM, Jeffrey Haas <jh...@pfrc.org> wrote:
> 
> Note that I'm choosing this message out of a mixed thread to give a reply.
> So, not all of this is targeted as a response to you, Acee.
> 
> On Tue, Oct 23, 2018 at 04:30:52PM +0000, Acee Lindem (acee) wrote:
>> Hi Albert, Les,
>> 
>> I tend to agree with Les that BFD doesn’t seem like the right protocol for 
>> this. Note that if you use OSPF as your IGP and flap the interface when the 
>> MTU changes, you’ll detect MTU mismatches immediately due to OSPF’s DB 
>> exchange MTU negotiation. Granted, control plane detection won’t detect data 
>> plane bugs resulting in MTU fluctuations but I don’t see this as a frequent 
>> event.
> 
> Commenting specifically on the OSPF case, when you have such misconfigured
> MTUs, this manifests as weird protocol hiccups.  You don't so much detect
> that there's an MTU issue - you just see OSPF failing to make progress.

However, when implementations start supporting ietf-ospf.yang, there’ll be an 
if-config-err notification specifically indicating an MTU-mismatch. ;^) 

Thanks,
Acee


> 
> Commenting on the general "well, this other protocol has MTU features",
> one of the meta issues is there's no guarantee that a given protocol may be
> running over a given link.  Several of the scenarios Albert writes about the
> links may only be running BGP as a control plane protocol.  
> 
> Should BFD be *the* protocol for this?  Possibly not.  What else instead?  A
> specific MTU probing protocol?  What does it look like?  My suspicion is
> that such a thing would have a large overlap with BFD state machinery.
> 
> Which brings up the question of speed:
> 
>> From: ginsb...@cisco.com At: 10/23/18 10:45:02
>> 
>> Please understand that I fully agree with the importance of being able to 
>> detect/report MTU issues. In my own experience this can be a difficult 
>> problem to diagnose. You do not have to convince me that some improvement in 
>> detection/reporting is needed. The question really is whether using BFD is 
>> the best option.
>> 
>> Could you respond to my original questions – particularly why sub-second 
>> detection of this issue is a requirement?
> 
> Sub-second detection of MTU is not specifically the requirement - at least
> in Albert's use case.  I'll let others write about their own requirements.
> 
> The interesting question arises that if BFD is where we're choosing our MTU
> probing, what's the intersection between BFD timeliness requirements with
> its usual shorter packets vs. MTU probing?  
> 
> Note this is particularly important for standard single-hop BFD since you
> only ever have a *single* session between a given set of endpoints.
> 
>> It has been stated that there is a need for sub-second detection of this
>> condition – but I really question that requirement.
>> 
>> What I would expect is that MTU changes only occur as a result of some
>> maintenance operation (configuration change, link addition/bringup,
>> insertion of a new box in the physical path etc.).
> 
> Unfortunately, that's not always the case.
> 
> An example (not necessarily Albert's) of where this is problematic is in the
> case of leased circuits that are partially created using some sort of
> mechanism such as L2VPNs.  While it may appear to the customer that the
> circuit is a leased line of a given set of properties, it may be delivered
> through the provider network in a packet-switched environment.  This means
> that MTU issues through the provider network, perhaps the result of a
> topology change, may result in issues.  
> 
> So, this work - and then stop working.
> 
> 
>> The idea of using a
>> mechanism which is specifically tailored for sub-second detection to
>> monitor something that is only going to change occasionally seems
>> inappropriate.
> 
> For Albert's specific use case, I think there's some level of tolerance for
> detection timing for a down link vs. bad MTU detection.  However, I don't
> think we pretend to speak for all possible users of such a mechanism.  And
> thus, this discussion. :-)
> 
>> It makes me think that other mechanisms (some form of OAM,
>> enhancements to routing protocols to do what IS-IS already does •) could
>> be more appropriate and would still meet the operational requirements.
> 
> As noted above, I'm not sure layering this onto a full control plane
> protocol as the primary mechanism is the best idea.  (You want MTU
> detection?  You have to build a dummy IS-IS session?  Ugh.)
> 
> Certainly documenting existing mechanisms and when they can deal with the
> use case is worth noting - perhaps even in the BFD Large draft itself.
> Please feel free to suggest text.
> 
> But we have other situations where an IGP-like MTU mechanism won't help.
> Two simple examples being BGP and static routes.
> 
> Also, certainly some form of lower layer mechanism could be used,if
> applicable.  Given that one form of known issue is when a LAG is built from
> component elements where a given element may have smaller MTU than their
> siblings, it'd be nice if this was solved in something like LACP as well.
> (And we note that BFD for LAG exists, and is one of the possible
> applications for BFD-large as well.)
> 
> 
> 
> -- Jeff

Reply via email to