Re: Flapping Transport

Jared Mauch Tue, 01 Aug 2023 11:40:12 -0700

> On Aug 1, 2023, at 2:18 PM, Mike Hammett <[email protected]> wrote:
> 
> I have a wave transport vendor that suffered issues twice about ten days 
> apart, causing my link to flap a bunch. I put in a ticket on the second set 
> of occurrences. I was told that there was a card issue identified and would 
> be notified when the replacement happened. Ticket closed.
> 
> Three weeks later, I opened a new ticket asking for the status. The new card 
> arrived the next day, but since no more flaps were happening, the card would 
> not be replaced. Ticket closed.
> 
> 
> A) It doesn't seem like they actually did anything to fix the circuit.
> B) They admitted a problem and sent a new card.
> C) They later decided to not do anything.
> 
> 
> Is that normal?
> Is that acceptable?
> 
> 
> To avoid issues flapping causes, I disabled that circuit until repaired, but 
> it seems like they're not going to do anything and I only know that because I 
> asked.


With passive components like amplifiers and such, or they might have had 
someone do work that they don’t want to fess up to (which is kinda silly) I get 
that.

I have our junipers configured with a 5 second up timer, eg: "hold-time up 5000”

This way a flapping circuit must be stable for at least a few seconds before it 
can be placed back into service, otherwise if you have a prefix that comes from 
connected/direct/static/qualified-next-hop it won’t go into another protocol 
and possibly cause a globally visible BGP event.

Some providers have a much more disruptive layer-1 infrastructure and will ask 
you to configure a 1s+ up timer.  I think there’s an interesting question that 
could go either way, do you want transport side faults to be exposed to you, or 
should the client interface in a system be held up so you don’t have that fault 
condition forward (sometimes called FDI) to the client interface.

They may have had the system misconfigured so you saw a fault on a protected 
path when there was a switch.  It can take some time for the transponder to 
re-tune if the timing is different if your A path is 25km and B side is 5km and 
you have a optical switch, with the higher PHY rates it will take some extra 
time.

I know that Cisco also has these interface timers, but some of the others may 
not (eg: I don’t know if Mikrotik has them, but queue the wiki in a reply).

If it’s stable for 48 hours, I would place it back into service, but you should 
escalate at the same time and determine if they were truly hands off.  It may 
be a fiber was bent and is now fixed and that actually was the root cause.

Hope this helps you and a few others.

- jared
Re: Flapping Transport

Reply via email to