If you use OpenVPN on Linux 2.2 or 4 2.4 or Solaris, you may be suffering from a bug which causes connections to hang under heavy load. The symptoms are very similar to the MTU problems discussed frequently in these mailing lists. But it turns out that this bug is not caused by MTU problems. It's a bug in the tun/tap driver.

There are multiple tun/tap drivers. The bug discussed here exists in the tun-1.1 driver at http://vtun.sourceforge.net/tun/. This is the tun/tap driver which is recommended by the INSTALL document for OpenVPN (http://openvpn.net/install.html).

The bug in the tun-1.1 driver definitely affects Linux 2.2 and 2.4. It may also affect other operating systems; I did not check because I have no way of testing them. If you use this driver, you are likely affected by the bug and may want to install a patch. With this patch installed, OpenVPN works reliably on Linux 2.2. (I did not test the driver on any other platforms.)

The bug is that poll() returns the wrong result if there is queued data available to read. In these cases, it fails to return the POLLOUT|POLLWRNORM bits. This causes openvpn to hang instead of sending data. It is most likely to happen when there is much data to send.

Here is a patch for the Linux 2.2 kernel:
--- tun-1.1/linux/2.2/tun.c     2000-10-23 22:13:08.000000000 -0700
+++ tun.new/linux/2.2/tun.c     2005-12-22 00:27:56.000000000 -0800
@@ -180,8 +180,19 @@

    poll_wait(file, &tun->read_wait, wait);

+ /* Data written to the /dev/tunX device is immediately placed into a socket buffer, making it + * available to networking code at the tunX interface. Writes never block. + * Likewise, data flows from the network stack, through the tunX interface and into the /dev/tun* device,
+    * where it is queued, making it available for read().
+    * Thus the character device /dev/tunX is:
+ * - readable if data was "transmitted" to the tunX interface and is now queued at the /dev/tunX device.
+    *   - always writable.
+    * Everything written here is equally true of taps.
+ * The author made a mistake when implementing this routine; he forgot that the device is always writable.
+    * -jeff stearns 22-Dec-2005
+    */
    if( skb_queue_len(&tun->txq) )
-      return POLLIN | POLLRDNORM;
+      return POLLIN | POLLRDNORM | POLLOUT | POLLWRNORM;

    return POLLOUT | POLLWRNORM;
 }

The patch for the Linux 2.4 driver is analogous. The bug may also be in the version for other operating systems; I didn't check.

I tried mailing this patch to the tun-1.1 driver developer, but the author's mailbox no longer exists. Thus I expect that the bug will never be fixed; you'll need to install the patch for yourself.

Reply via email to