Hey Nimrod,

> I was contacted by my NOC to investigate a LAG that was not distributing 
> traffic evenly among the members to the point where one member was congested 
> while the utilization on the LAG was reasonably low. Looking at my netflow 
> data, I was able to confirm that this was caused by a single large flow of 
> ESP traffic. Fortunately, I was able to shift this flow to another path that 
> had enough headroom available so that the flow could be accommodated on a 
> single member link.
>
> With the increase in remote workers and VPN traffic that won't hash across 
> multiple paths, I thought this anecdote might help someone else track down a 
> problem that might not be so obvious.

This problem is called elephant flow. Some vendors have solution for
this, by dynamically monitoring utilisation and remapping the
hashResult => egressInt table to create bias to offset the elephant
flow.

One particular example:
https://www.juniper.net/documentation/en_US/junos/topics/reference/configuration-statement/adaptive-edit-interfaces-aex-aggregated-ether-options-load-balance.html

Ideally VPN providers would be defensive and would use SPORT for
entropy, like MPLSoUDP does.

-- 
  ++ytti

Reply via email to