On Mon April 5 2010 16:19:19 David Young wrote: > On Mon, Apr 05, 2010 at 08:38:41AM -0600, Sverre Froyen wrote: > > Monitoring the number-of-free-rbufs counter during network traffic, I > > find that it normally stays at 32, occasionally dropping into the > > twenties. Sometimes, however, the count will abruptly jump to zero. At > > this point, the free count does not recover but remains at zero for a > > *long* time. The interface does not receive any packets as long as the > > driver has no free rbufs. After about ten minutes, I see a flurry of > > calls to iwn_free_rbuf and the free count returns to 32. At this point > > the interface is working properly again. > > During the flurry of calls to iwn_free_rbuf(), can you get a backtrace > in iwn_free_rbuf()? I hope that will show us what mechanism frees them > in a flurry.
Here is a trace from the first call to iwn_free_rbuf after the interface has been locked up for ~10 mins. iwn_free_rbuf m_freem tcp_freeq tcp_close tcp_timer_rexmt callout_softclock softint_dispatch DDB lost frame for netbsd:Xsoftintr Xsoftintr This looks like some type of timeout. For what it's worth, I had quit the applications that I was using to trigger the lock-up long before the call to iwn_free_rbuf (although there were additional programs with network connections open at the time of the call, ntpd, apache and openvpn come to mind). In the process of collecting the above trace, I added a call to panic if iwn_free_rbuf was called with free buffer count of zero. It turns out this happens rather quickly (long before the interface locks up). Here is the trace from a non-locked-up call: iwn_free_rbuf soreceive do_sys_recvmsg sys_recvfrom syscall Sverre PS I received the following comment from Damien Bergamini: >This sounds similar to a bug that was fixed in OpenBSD ~3 years >ago (wpi(4) rev 1.51): >http://www.openbsd.org/cgi- bin/cvsweb/src/sys/dev/pci/if_wpi.c?rev=1.51;content-type=text%2Fx-cvsweb- markup >http://www.openbsd.org/cgi- bin/cvsweb/src/sys/dev/pci/if_wpi.c.diff?r1=1.50;r2=1.51;f=h > >You should look at NetBSD's wpi(4) as it seems to have this issue >fixed too (using m_dup). I have no idea why it has not been >backported to NetBSD's iwn(4) though. It looks like the NetBSD wpi changes he refers to must be these: revision 1.10 date: 2007/06/18 19:40:49; author: degroote; state: Exp; lines: +37 -19 Add a workaround in the case where we have low number of rbuf. It seems to fix problem of frozen network with wpi. Looking through the if_iwn.c revisions it looks like the iwn driver had the m_dup code until rev 1.33 (when iwn_rx_intr was replaced by iwn_rx_done). I'll see if I can reintegrate the code. PPS I used panic(9) to get the traces. Is it safe to "continue" from such a diagnostic panic?