On Thu, 27 Oct 2005, James Yonan wrote: > On Thu, 27 Oct 2005, Gunter Ohrner wrote: > > > Hi! > > > > We're experiencing regular assertion failures and subsequent OpenVPN server > > crashes on one of our servers. > > > > The assertion failure is always the same: > > > > ,---- > > | Assertion failed at multi.c:1561 > > | Exiting > > `---- > > > > The crash seems to leave openvpn's network device, tap0 in our case, in a > > state which blocks all processes subsequently trying to access to device. > > The crashes happen every few days and a restart of the server machine is > > needed. > > > > Does anyone have any quick idea of this behaviour's cause? Unfortunately > > according to Google we're the only ones on Linux 2.6 with this crash. ;) > > > > http://openvpn.net/archive/openvpn-users/2005-08/msg00011.html mentions a > > similar problem but running kernel 2.2.25 and no solution has been provided > > so far, a suggested patch did not fix the problem for the reporter. > > > > ,----[ Some details about our setup ] > > | * Debian Sarge i386 > > | * Kernel 2.6.12.6 32 Bit Opteron optimized > > | * Debian's 2.0-1sarge1 openvpn package > > | * Dual Opteron 246 2,0GHz > > `---- > > > > ,----[ OpenVPN configuration (excerpts) ] > > | * bind to single interface/port > > | * use udp > > | * use tap0 > > | * PSK authentication > > `---- > > > > The server is also routing traffic and we do traffic limiting for some > > traffic (destination dependant, to comply with a leased link policy). This > > limiting is not done on the device on which the encrypted openvpn traffic > > leaves the machine but on an IMQ device before the incoming traffic enters > > tap0, so openvpn should not see anything from it. > > > > Are there any further details needed to chase this bug, in whichever kind > > of > > software we're using it may be? > > This assertion usually occurs when the tun/tap device locks up and doesn't > accept any write syscalls. > > Can you try an earlier 2.6 kernel (or 2.4), and see if the problem goes > away? > > I would lean towards thinking that this is a tun/tap driver issue, simply > because I've never heard about it on anything other than the old > unmaintained 2.2 driver, or in this case a very new kernel. > > But having said that, I can't yet rule out that it's an OpenVPN bug. I > could certainly "fix" the assertion by making OpenVPN wait forever for the > tun/tap device to accept output. But then OpenVPN would simply hang, and > you would have even less information to go on.
Ok, here's an update. I'm attaching a patch which I believe will fix this. I've only been able to reproduce the assertion under simulated conditions, therefore it would be great if you could test in a real-world setting. The patch should apply cleanly to 2.0, 2.0.x, or 2.1-beta. James
Index: multi.c =================================================================== --- multi.c (revision 672) +++ multi.c (revision 730) @@ -1583,7 +1583,8 @@ struct multi_instance *mi; bool ret = true; - ASSERT (!m->pending); + if (m->pending) + return true; if (!instance) { @@ -1737,7 +1738,8 @@ printf ("TUN -> TCP/UDP [%d]\n", BLEN (&m->top.c2.buf)); #endif - ASSERT (!m->pending); + if (m->pending) + return true; /* * Route an incoming tun/tap packet to Index: forward.c =================================================================== --- forward.c (revision 672) +++ forward.c (revision 730) @@ -609,10 +609,10 @@ */ int status; + /*ASSERT (!c->c2.to_tun.len);*/ + perf_push (PERF_READ_IN_LINK); - ASSERT (!c->c2.to_tun.len); - c->c2.buf = c->c2.buffers->read_link_buf; ASSERT (buf_init (&c->c2.buf, FRAME_HEADROOM_ADJ (&c->c2.frame, FRAME_HEADROOM_MARKER_READ_LINK))); status = link_socket_read (c->c2.link_socket, &c->c2.buf, MAX_RW_SIZE_LINK (&c->c2.frame), &c->c2.from); @@ -824,13 +824,13 @@ void read_incoming_tun (struct context *c) { - perf_push (PERF_READ_IN_TUN); - /* * Setup for read() call on TUN/TAP device. */ - ASSERT (!c->c2.to_link.len); + /*ASSERT (!c->c2.to_link.len);*/ + perf_push (PERF_READ_IN_TUN); + c->c2.buf = c->c2.buffers->read_tun_buf; #ifdef TUN_PASS_BUFFER read_tun_buffered (c->c1.tuntap, &c->c2.buf, MAX_RW_SIZE_TUN (&c->c2.frame)); @@ -1056,14 +1056,15 @@ { struct gc_arena gc = gc_new (); - perf_push (PERF_PROC_OUT_TUN); - /* * Set up for write() call to TUN/TAP * device. */ - ASSERT (c->c2.to_tun.len > 0); + if (c->c2.to_tun.len <= 0) + return; + perf_push (PERF_PROC_OUT_TUN); + /* * The --mssfix option requires * us to examine the IPv4 header.