Rick Ernst wrote:
Although the implementation is Cisco-specific, this feels more appropriate
for NANOG.
We've started rolling out a state-wide monitoring system based on Cisco's
"IP SLA" feature set. Out of 5 sites deployed so far (different locations,
different providers), we are consistently seeing one-way latency mirror the
opposite direction. As source-destination latency goes up,
destination-source latency goes down and vice versa.
Myself and the monitoring team have ripped apart the OIDs, IP SLA
configuration, and monitoring system. We've also built an ad-hoc system to
compare the results. It's still consistent behavior. It's not a true
mirror; there is definitely variation between the data collection, but at
the 10,000 foot level, there is an obvious and consistent mirror to the
data.
The network topology is independant service providers all providing backhaul
to a local ethernet exchange.
Has anybody seen this type of behavior? We are solidly convinced that we are
using the proper OIDs and making the proper transformations of the data.
The two remaining causes appear to be either "natural behavior of the links"
and/or "artifact in the IP SLA mechanism".
Any ideas?
Having never used cisco's IP SLA (or even read about it), take this with
a sack of salt.
I assume this product works by having a packet with a timestamp sent
from the source to the destination where it is timestamped again and
either sent back, or another packet is sent in the other direction. The
difference between the two timestamps gives you the latency in that
direction.
Now, how are your clocks syncronised? are they synchronised using NTP?
or something better (GPS?) If one of your clocks is drifting with
respect to the other then you'll see this effect. Does your clock drift
because NTP is failing to keep the clock well syncronised when it's
connection to it's parent NTP server is saturated?