Hi Les,

All the MTU issues we have seen were on Telco WAN circuits. They were not 
planned events, and there were no alarms on our side. If there had been alarms, 
it would have made troubleshooting that much easier as we would know where to 
focus troubleshooting efforts. 

ISIS does support padding, but 30s outage is not an acceptable outage in our 
network. Additionally, we have WAN circuits that run other routing protocols 
such as OSPF and eBGP which do not support hello padding. Even if the routing 
protocol does support padding, we may not want to use aggressive timer as it is 
a control plane activity. Also, as mentioned previously, if we use minimum 1s 
protocol timer, we still have about 3s of outage. As per common practice, we 
leave our protocol timers default and leverage BFD for fast failure detection. 

Hence, I believe BFD is a very good mechanism to address this issue. I 
understand some customers want to run very aggressive BFD timers to detect 
failures quickly (at the expense of higher network churns). We found that we 
can achieve sub-second convergence with protection using relatively 
conservative BFD interval of 150msec. Also, as mentioned previously, depending 
on implementation, the BFD padding support may have very small impact on 
performance.

I would also add that the BFD padding support will be an option on a per 
interface/neighbor basis. Network Designer who does not have to deal with MTU 
issue can choose to use the default behavior. They can also enable padding on 
WAN circuits, and use default for back-back intra-site links.

Thanks
Albert


From: ginsb...@cisco.com At: 10/23/18 19:52:53To:  Albert Fu (BLOOMBERG/ 120 
PARK ) ,  rtg-bfd@ietf.org
Subject: RE: BFD WG adoption for draft-haas-bfd-large-packets

     

Albert - 
  

From: Albert Fu (BLOOMBERG/ 120 PARK) <af...@bloomberg.net> 
Sent: Tuesday, October 23, 2018 8:45 AM
To: rtg-bfd@ietf.org; Les Ginsberg (ginsberg) <ginsb...@cisco.com>
Subject: RE: BFD WG adoption for draft-haas-bfd-large-packets 
  

Hi Les, 

  

Given that it takes relative lengthy time to troubleshoot the MTU issue, and 
the associated impact on customer traffic, it is important to have a reliable 
and fast mechanism to detect  the issue.  

  
[Les:] This is  one of the points where we are not in full agreement. I agree 
you need an easy and reliable way to detect the problem when it occurs. 
However, I disagree that you need to do this “fast” – when fast is defined as 
sub-second. 
  
You have something that we know only occurs during some maintenance event – 
which is planned and only occurs “once/day,week”. 
Checking for this even once/second is overly aggressive. 
If it came for free, then no reason not to do so. 
But as this discussion has shown, there are costs/risks. 
  
For example, if you were using IS-IS and you detected this within the default 
adjacency hold time (30 seconds on p2p circuits) – would that be too slow  for 
you? If so, please explain why this is too slow. 
  
I think the primary issue here is ease of use and reliability. Whether 
detection time is one second or one minute seems relatively unimportant. 
  
Do you disagree? 
  
   Les 
   

Reply via email to