Hey Nimrod, > I was contacted by my NOC to investigate a LAG that was not distributing > traffic evenly among the members to the point where one member was congested > while the utilization on the LAG was reasonably low. Looking at my netflow > data, I was able to confirm that this was caused by a single large flow of > ESP traffic. Fortunately, I was able to shift this flow to another path that > had enough headroom available so that the flow could be accommodated on a > single member link. > > With the increase in remote workers and VPN traffic that won't hash across > multiple paths, I thought this anecdote might help someone else track down a > problem that might not be so obvious.
This problem is called elephant flow. Some vendors have solution for this, by dynamically monitoring utilisation and remapping the hashResult => egressInt table to create bias to offset the elephant flow. One particular example: https://www.juniper.net/documentation/en_US/junos/topics/reference/configuration-statement/adaptive-edit-interfaces-aex-aggregated-ether-options-load-balance.html Ideally VPN providers would be defensive and would use SPORT for entropy, like MPLSoUDP does. -- ++ytti