On Thu, 27 Oct 2005, James Yonan wrote:

> On Thu, 27 Oct 2005, Gunter Ohrner wrote:
> 
> > Hi!
> > 
> > We're experiencing regular assertion failures and subsequent OpenVPN server
> > crashes on one of our servers.
> > 
> > The assertion failure is always the same: 
> > 
> > ,----
> > | Assertion failed at multi.c:1561
> > | Exiting
> > `----
> > 
> > The crash seems to leave openvpn's network device, tap0 in our case, in a
> > state which blocks all processes subsequently trying to access to device.
> > The crashes happen every few days and a restart of the server machine is
> > needed.
> > 
> > Does anyone have any quick idea of this behaviour's cause? Unfortunately
> > according to Google we're the only ones on Linux 2.6 with this crash. ;)
> > 
> > http://openvpn.net/archive/openvpn-users/2005-08/msg00011.html mentions a
> > similar problem but running kernel 2.2.25 and no solution has been provided
> > so far, a suggested patch did not fix the problem for the reporter.
> > 
> > ,----[ Some details about our setup ]
> > | * Debian Sarge i386
> > | * Kernel 2.6.12.6 32 Bit Opteron optimized
> > | * Debian's 2.0-1sarge1 openvpn package
> > | * Dual Opteron 246 2,0GHz
> > `----
> > 
> > ,----[ OpenVPN configuration (excerpts) ]
> > | * bind to single interface/port
> > | * use udp
> > | * use tap0
> > | * PSK authentication
> > `----
> > 
> > The server is also routing traffic and we do traffic limiting for some
> > traffic (destination dependant, to comply with a leased link policy). This
> > limiting is not done on the device on which the encrypted openvpn traffic
> > leaves the machine but on an IMQ device before the incoming traffic enters
> > tap0, so openvpn should not see anything from it.
> > 
> > Are there any further details needed to chase this bug, in whichever kind 
> > of 
> > software we're using it may be?
> 
> This assertion usually occurs when the tun/tap device locks up and doesn't 
> accept any write syscalls.
> 
> Can you try an earlier 2.6 kernel (or 2.4), and see if the problem goes
> away?
> 
> I would lean towards thinking that this is a tun/tap driver issue, simply 
> because I've never heard about it on anything other than the old 
> unmaintained 2.2 driver, or in this case a very new kernel.
> 
> But having said that, I can't yet rule out that it's an OpenVPN bug.  I 
> could certainly "fix" the assertion by making OpenVPN wait forever for the 
> tun/tap device to accept output.  But then OpenVPN would simply hang, and 
> you would have even less information to go on.

Ok, here's an update.  I'm attaching a patch which I believe will fix 
this.  I've only been able to reproduce the assertion under simulated 
conditions, therefore it would be great if you could test in a real-world 
setting.

The patch should apply cleanly to 2.0, 2.0.x, or 2.1-beta.

James
Index: multi.c
===================================================================
--- multi.c     (revision 672)
+++ multi.c     (revision 730)
@@ -1583,7 +1583,8 @@
   struct multi_instance *mi;
   bool ret = true;
 
-  ASSERT (!m->pending);
+  if (m->pending)
+    return true;
 
   if (!instance)
     {
@@ -1737,7 +1738,8 @@
       printf ("TUN -> TCP/UDP [%d]\n", BLEN (&m->top.c2.buf));
 #endif
 
-      ASSERT (!m->pending);
+      if (m->pending)
+       return true;
 
       /* 
        * Route an incoming tun/tap packet to
Index: forward.c
===================================================================
--- forward.c   (revision 672)
+++ forward.c   (revision 730)
@@ -609,10 +609,10 @@
    */
   int status;
 
+  /*ASSERT (!c->c2.to_tun.len);*/
+
   perf_push (PERF_READ_IN_LINK);
 
-  ASSERT (!c->c2.to_tun.len);
-
   c->c2.buf = c->c2.buffers->read_link_buf;
   ASSERT (buf_init (&c->c2.buf, FRAME_HEADROOM_ADJ (&c->c2.frame, 
FRAME_HEADROOM_MARKER_READ_LINK)));
   status = link_socket_read (c->c2.link_socket, &c->c2.buf, MAX_RW_SIZE_LINK 
(&c->c2.frame), &c->c2.from);
@@ -824,13 +824,13 @@
 void
 read_incoming_tun (struct context *c)
 {
-  perf_push (PERF_READ_IN_TUN);
-
   /*
    * Setup for read() call on TUN/TAP device.
    */
-  ASSERT (!c->c2.to_link.len);
+  /*ASSERT (!c->c2.to_link.len);*/
 
+  perf_push (PERF_READ_IN_TUN);
+
   c->c2.buf = c->c2.buffers->read_tun_buf;
 #ifdef TUN_PASS_BUFFER
   read_tun_buffered (c->c1.tuntap, &c->c2.buf, MAX_RW_SIZE_TUN (&c->c2.frame));
@@ -1056,14 +1056,15 @@
 {
   struct gc_arena gc = gc_new ();
 
-  perf_push (PERF_PROC_OUT_TUN);
-
   /*
    * Set up for write() call to TUN/TAP
    * device.
    */
-  ASSERT (c->c2.to_tun.len > 0);
+  if (c->c2.to_tun.len <= 0)
+    return;
 
+  perf_push (PERF_PROC_OUT_TUN);
+
   /*
    * The --mssfix option requires
    * us to examine the IPv4 header.

Reply via email to