If you use OpenVPN on Linux 2.2 or 4 2.4 or Solaris, you may be
suffering from a bug which causes connections to hang under heavy load.
The symptoms are very similar to the MTU problems discussed frequently
in these mailing lists. But it turns out that this bug is not caused
by MTU problems. It's a bug in the tun/tap driver.
There are multiple tun/tap drivers. The bug discussed here exists in
the tun-1.1 driver at http://vtun.sourceforge.net/tun/. This is the
tun/tap driver which is recommended by the INSTALL document for OpenVPN
(http://openvpn.net/install.html).
The bug in the tun-1.1 driver definitely affects Linux 2.2 and 2.4. It
may also affect other operating systems; I did not check because I have
no way of testing them. If you use this driver, you are likely
affected by the bug and may want to install a patch. With this patch
installed, OpenVPN works reliably on Linux 2.2. (I did not test the
driver on any other platforms.)
The bug is that poll() returns the wrong result if there is queued data
available to read. In these cases, it fails to return the
POLLOUT|POLLWRNORM bits. This causes openvpn to hang instead of
sending data. It is most likely to happen when there is much data to
send.
Here is a patch for the Linux 2.2 kernel:
--- tun-1.1/linux/2.2/tun.c 2000-10-23 22:13:08.000000000 -0700
+++ tun.new/linux/2.2/tun.c 2005-12-22 00:27:56.000000000 -0800
@@ -180,8 +180,19 @@
poll_wait(file, &tun->read_wait, wait);
+ /* Data written to the /dev/tunX device is immediately placed into
a socket buffer, making it
+ * available to networking code at the tunX interface. Writes
never block.
+ * Likewise, data flows from the network stack, through the tunX
interface and into the /dev/tun* device,
+ * where it is queued, making it available for read().
+ * Thus the character device /dev/tunX is:
+ * - readable if data was "transmitted" to the tunX interface and
is now queued at the /dev/tunX device.
+ * - always writable.
+ * Everything written here is equally true of taps.
+ * The author made a mistake when implementing this routine; he
forgot that the device is always writable.
+ * -jeff stearns 22-Dec-2005
+ */
if( skb_queue_len(&tun->txq) )
- return POLLIN | POLLRDNORM;
+ return POLLIN | POLLRDNORM | POLLOUT | POLLWRNORM;
return POLLOUT | POLLWRNORM;
}
The patch for the Linux 2.4 driver is analogous. The bug may also be
in the version for other operating systems; I didn't check.
I tried mailing this patch to the tun-1.1 driver developer, but the
author's mailbox no longer exists. Thus I expect that the bug will
never be fixed; you'll need to install the patch for yourself.