Hello all.
Over time, we've run into occurrences of both bugs and human error, both in our 
own gear and in our partner networks' gear, specifically affecting multi-path 
forwarding, at pretty much all layers: Multi-chassis LAG, ECMP, and BGP MP.  
(Yes, I am a corner-case magnet.  Lucky me.)

Some of these issues were fairly obvious when they happened, but some were 
really hard to pin down.

We've found that typical network monitoring tools (Observium & Smokeping, not 
to mention plain old ping and traceroute) can't really detect a hashing-related 
or multi-path-related problem: either the packets get through or they don't.

Can anyone recommend either tools or techniques to validate that multi-path 
forwarding either is, or isn't, working correctly in a production network?  I'm 
looking to add something to our test suite for when we make changes to critical 
network gear.  Almost all the scenarios I want to test only involve two paths, 
if that helps.

The best I've come up with so far is to have two test systems (typically VMs) 
that use adjacent IP addresses and adjacent MAC addresses, and test both 
inbound and outbound to/from those, blindly trusting/hoping that hashing 
algorithms will probably exercise both paths.

Some of the problems we've seen show that merely looking at interface counters 
is insufficient, so I'm trying to find an explicit proof, not implicit.

Any suggestions?  Surely other vendors and/or admins have screwed this up in 
subtle ways enough times that this knowledge exists?  (My Google-fu is usually 
pretty good, but I'm striking out - maybe I'm using the wrong terms.)

-Adam

Adam Thompson
Consultant, Infrastructure Services
[1593169877849]
100 - 135 Innovation Drive
Winnipeg, MB, R3T 6A8
(204) 977-6824 or 1-800-430-6404 (MB only)
athomp...@merlin.mb.ca<mailto:athomp...@merlin.mb.ca>
www.merlin.mb.ca<http://www.merlin.mb.ca/>

Reply via email to