* Jason Iannone
But the clever budget conscious among us have deployed router links over provided MPLS based L2 services as critical infrastructure. We have an invisible WAN. In the absence of L1 PM statistics, how do we validate service over other networks? 802.3ag and y.1731 attempt to answer that question.
If you frequently pull your interface counters into a decent time series database, it is also possible to simply compare the pps rates on the transmitting and receiving sides, and alert/mitigate if the discrepancy between the two becomes too large.
There will always be a certain discrepancy, as you won't be able to poll the counters on both sides at the exact same time, but averaging the pps rates over a certain time window should get you fairly close to 1:1 TX:RX pps.
Looking at a few random healthy links in our network, it seems like an appropriate alert threshold that would avoid false alarms would be a receiving pps rate over a 30-minute sliding window being more than 5‰ off from the transmitting pps rate. YMMV.
Obviously such an approach would not catch every single brownout, but I'd wager a guess you'd catch quite a few (and the worse it is, the more likely it is that it will be caught). Much better than waiting around for phone calls from upset customers, at any rate.
Tore