On Tue, Jun 12, 2012 at 9:37 AM, Datty <datty....@gmail.com> wrote: > On Tue, Jun 12, 2012 at 2:21 PM, Michael Mol <mike...@gmail.com> wrote: >> On Jun 12, 2012 8:59 AM, "Datty" <datty....@gmail.com> wrote:
[snip] >> More detail later...but make sure your vpn link is not TCP. UDP, fine, >> IP-IP, fine, but not TCP. TCP transport for a VPN tunnel leads to ugly >> traffic problems. > Ah it is TCP at the moment. Not something I could change too easily either. > Is it possible to work around or is it not worth fighting with? If all of these cases are true: * You only have TCP traffic going over that VPN * You don't have any latency-sensitive traffic going over that VPN (no VOIP, no interactive terminal sessions and you won't pull your hair out over 10s or more round-trips slowing down page loads) * You don't have large bulk data transfers going over that VPN (my best example of personal experience here was trying to locally sync my work-related IMAP mailbox) ...then it's not worth fighting with. It's very unlikely you fall in that camp. The problem of TCP VPN transport is a confluence of three issues: 1) You're likely to have packet loss underneath that transport due to things like congestion...the TCP transport will hide this from tunneled traffic and retransmit itself. 2) In TCP, Nagle's Algorithm handles flow throttling, but it depends on detecting packet loss to limit how many packets it pushes. 3) Your VPN endpoint will very probably buffer a very large amount of data for sending if its TCP transport link is acting slow. Here's what happens: 1) Your TCP app on your computer opens a connection with a remote host. This connection is encapsulated inside your TCP OpenVPN tunnel. 2) Your app's TCP connection starts exchanging data. For as long as it's not losing any packets, it figures it can send more and more data without waiting for a response; this is Nagle's Algorithm managing your TCP sliding window, and it's why TCP can scale from dial-up speeds up to 10g ethernet. 3) Your VPN link's TCP connection experiences packet loss. Maybe it's because of a congested router between you and the remote side of the VPN, or maybe it's because someone's ADSL connection is pushing more than its measly 768Kb/s upstream speed allows for. Or maybe it's noise on the copper causing packet loss on the ADSL link. Or maybe it's a frame collision on the PPoE link. ...time passes... 4) Your VPN link's TCP stack notes the packet loss and retransmits the lost packet until the packet gets through. 5) The connection traffic from step (2) is completely unaware that the VPN's TCP connection is fielding packet loss issues for it, and Nagle's Algorithm figures, 'hey, this is a high-bandwith link! Let's shove more data!' 6) OpenVPN link is now receving data it can't stuff into the pipe right this second, so it buffers it for a moment, and then sends it when its turn has come. Still, no data is lost. ...time passes... 7) Steps 4-6 repeat themselves, causing your original connection to become more and more confident about the bandwidth of the pipe. ...time passes... 8) The connection from step 2 is now so confident of the connection speed of the pipe, it's pushing data to OpenVPN faster than OpenVPN could conceivably push out, even if there were no packet loss issues. You've now got a cycle of just steps 5 and 6. Presumably, you'd eventually hit OpenVPN's buffer limit. I don't know what that is, and I don't think it's tuneable. The one time I personally saw, measured and helped diagnose this, I was getting up to a *fifteen minute* round-trip ping time over the VPN, even though the round-trip time for ping outside the VPN between the VPN endpoints was only about 100ms. Watching that round-trip time climb was surreal until I figured out what was happening. Switching the VPN transport to UDP allowed the tunneled connections' TCP stacks to properly gauge and react to available throughput. Even SIP started working over that VPN link. -- :wq