We monitor light levels and FEC values on all links and have thresholds for early-warning and PRe-failure analysis.
Short answer is yes we see links lose packets before completely failing and for dozens of reasons that’s still a good thing, but you need to monitor every part of a resilient network. Ms. Lady Benjamin PD Cannon of Glencoe, ASCE 6x7 Networks & 6x7 Telecom, LLC CEO l...@6by7.net "The only fully end-to-end encrypted global telecommunications company in the world.” FCC License KJ6FJJ Sent from my iPhone via RFC1149. > On Apr 29, 2021, at 2:32 PM, Eric Kuhnke <eric.kuh...@gmail.com> wrote: > > > The Junipers on both sides should have discrete SNMP OIDs that respond with a > FEC stress value, or FEC error value. See blue highlighted part here about > FEC. Depending on what version of JunOS you're running the MIB for it may or > may not exist. > > https://kb.juniper.net/InfoCenter/index?page=content&id=KB36074&cat=MX2008&actp=LIST > > In other equipment sometimes it's found in a sub-tree of SNMP adjacent to > optical DOM values. Once you can acquire and poll that value, set it up as a > custom thing to graph and alert upon certain threshold values in your choice > of NMS. > > Additionally signs of a failing optic may show up in some of the optical DOM > MIB items you can poll: https://mibs.observium.org/mib/JUNIPER-DOM-MIB/ > > It helps if you have some non-misbehaving similar linecards and optics which > can be polled during custom graph/OID configuration, to establish a baseline > 'no problem' value, which if exceeded will trigger whatever threshold value > you set in your monitoring system. > >> On Thu, Apr 29, 2021 at 1:40 PM Baldur Norddahl <baldur.nordd...@gmail.com> >> wrote: >> Hello >> >> We had a 100G link that started to misbehave and caused the customers to >> notice bad packet loss. The optical values are just fine but we had packet >> loss and latency. Interface shows FEC errors on one end and carrier >> transitions on the other end. But otherwise the link would stay up and our >> monitor system completely failed to warn about the failure. Had to find the >> bad link by traceroute (mtr) and observe where packet loss started. >> >> The link was between a Juniper MX204 and Juniper ACX5448. Link length 2 >> meters using 2 km single mode SFP modules. >> >> What is the best practice to monitor links to avoid this scenarium? What >> options do we have to do link monitoring? I am investigating BFD but I am >> unsure if that would have helped the situation. >> >> Thanks, >> >> Baldur >> >>