Albert –
We are in full agreement.
Delays in bringing BFD backup after a previous failure may well be warranted in
the break-in-middle scenarios.
I am not convinced this needs to be standardized – seems quite appropriate as
an implementation choice. But if any discussion were to occur in RFCs, I think
it should be in some BFD document.
As this draft is focused on OSPF protocol extensions, I don’t think BFD
dampening needs to be discussed. In any case it should not alter the
interaction between BFD and protocols. If it takes longer for BFD to come up
that just means the OSPF adjacency will not come up either – which is exactly
the behavior that is desired.
Les
From: Albert Fu (BLOOMBERG/ 120 PARK) <[email protected]>
Sent: Monday, January 31, 2022 6:50 AM
To: Les Ginsberg (ginsberg) <[email protected]>; [email protected];
[email protected]
Cc: Acee Lindem (acee) <[email protected]>;
[email protected]; [email protected]
Subject: RE: [Lsr] Working Group Last Call for "OSPF Strict-Mode for BFD" -
draft-ietf-lsr-ospf-bfd-strict-mode-04
Hi Les,
Your scenario below is indeed something we have encountered in our production
network in the non-strict scenario, due to "flapping" links, where routing
protocol could come up before BFD due to "break-in-middle" link issue
(interface stayed up, so routing protocol remained active). Strict mode will
address this issue.
Another point to add is that we do have as a standard on our interfaces to
safeguard against flapping link by configuring interface
hold-time/carrier-delay. However, this is only useful in situations where the
link physically goes down (and fast detection is automatic in most
implementation).
Nowadays, it is also common to see the "break-in-middle" failures. we use BFD
to detect this sort of failure within sub-second. And to dampen this sort of
break-in-middle failures, we will need to use BFD holdtime/dampening.
Thanks
Albert
From: [email protected]<mailto:[email protected]> At: 01/30/22 14:38:37
UTC-5:00
To: [email protected]<mailto:[email protected]>,
[email protected]<mailto:[email protected]>
Cc: Albert Fu (BLOOMBERG/ 120 PARK ) <mailto:[email protected]> ,
[email protected]<mailto:[email protected]>,
[email protected]<mailto:[email protected]>,
[email protected]<mailto:[email protected]>
Subject: RE: [Lsr] Working Group Last Call for "OSPF Strict-Mode for BFD" -
draft-ietf-lsr-ospf-bfd-strict-mode-04
Robert –
Here is what you said (emphasis added):
<snip>
But the timer I am suggesting is not related to BFD operation, but to OSPF
(and/or ISIS). It is not about BFD sessions being UP or DOWN. It is about
allowing BFD for more testing (with various parameters (for example increasing
test packet size in some discrete steps) before OSPF is happy to bring the adj.
up.
<end snip>
Point #1: If you want BFD to do more testing (such as MTU testing) then clearly
you need extensions to BFD (such as
https://datatracker.ietf.org/doc/draft-ietf-bfd-large-packets/ )
Point #2: The existing timers (as Ketan points out are mentioned in Section 5)
are applied today at the OSPF level precisely because OSPF does not currently
have strict-mode operation. So in a flapping scenario you could see the
following behavior:
a)BFD goes down
b)OSPF goes down in response to BFD
c)OSPF comes back up
d)Link is still unstable – so traffic is being dropped some of the time – but
perhaps OSPF adjacency stays up (i.e., OSPF hellos get through often enough to
keep the OSPF adjacency up)
So some implementations have chosen to insert a delay following “b”. This
doesn’t guarantee stability, but hopefully makes it less likely. And because
OSPF today does NOT wait for BFD to come up, the delay has to be implemented at
the OSPF level.
Once you have strict mode support, the sequence becomes:
a)BFD goes down
b)OSPF goes down in response to BFD
c)BFD comes back up
d)OSPF comes back up
Now, if the concern is that BFD comes back up while the link is still unstable,
the way to address that is to put a delay either before BFD attempts to bring
up a new session or a delay after achieving UP state before it signals UP to
its clients – such as OSPF. This is a better solution because all BFD clients
benefit from this. Ad if the link is still unstable, it is more likely that the
BFD session will go down during the delay period than it would be for OSPF
because the BFD timers are significantly more aggressive.
(BTW, this behavior can be done w/o a BFD protocol extension – it is purely an
implementation choice.)
From a design perspective, dampening is always best done at the lowest layer
possible. In most cases, interface layer dampening is best. If that is not
reliable for some reason, then move one layer up – not two layers up.
Les
From: Robert Raszuk <[email protected]<mailto:[email protected]>>
Sent: Sunday, January 30, 2022 10:05 AM
To: Ketan Talaulikar <[email protected]<mailto:[email protected]>>
Cc: Les Ginsberg (ginsberg) <[email protected]<mailto:[email protected]>>;
Acee Lindem (acee) <[email protected]<mailto:[email protected]>>;
[email protected]<mailto:[email protected]>;
Albert Fu <[email protected]<mailto:[email protected]>>; lsr
<[email protected]<mailto:[email protected]>>
Subject: Re: [Lsr] Working Group Last Call for "OSPF Strict-Mode for BFD" -
draft-ietf-lsr-ospf-bfd-strict-mode-04
Hi Ketan,
I would like to point out that the draft discusses the BFD "dampening" or
"hold-down" mechanism in Sec 5. We are aware of BFD implementations that
include such mechanisms in a protocol-agnostic manner.
BFD dampening or hold-time are completely orthogonal to my point. Both have
nothing to do with it.
Those timers only fire when BFD goes down. In my example BFD does not go down.
But we want to bring up the client adj. only after X ms/sec/min etc ...of
normal BFD operation if no failure is detected during that timer.
This draft indicates that OSPF adjacency will "advance" in the neighbor FSM
only after BFD reports UP.
And that is exactly too soon. In fact if you do that today without waiting some
time (if you retire the current OSPF timer) you will not help at all in the
case you are trying to address.
Reason being that perhaps 200 ms after BFD UP it will go down, but OSPF adj.
will get already established. It is really pretty simple.
Thx,
Robert.
PS. And yes I think ISIS should also get fixed in that respect.
_______________________________________________
Lsr mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/lsr