On Fri, Aug 31, 2012 at 12:45:53PM +0700, Eugene Grosbein wrote: > In previous letter I've described my attempts to try vr(4) from HEAD. > Now I'd like to explain why I've tried it. > > The problem is that stock vr(4) from 8.3-STABLE/i386 has serious issues for > my system. > I have home router with two vr interfaces, vr0 is for LAN (IPoE) and vr1 is > for WAN (PPPoE/mpd). > > Presently, every day my WAN vr interface stops running correctly: > sometimes it stops receiving all packets - tcpdump shows none of them. > Sometimes, it receives some but with great delay - up to 10 seconds (not > miliseconds) > and even more. tcpdump shows that delay occurs on receive path. > Sometimes, it even rearranges packets - tcpdump shows that some incoming ICMP > echo requests > with lower sequence numbers come in later that already answered > higher-numbered requests.
Hmm, it seems driver's consumer/producer index of RX path were corrupted. > > ifconfig vr1 down/up revives interface completely until next morning. > sysctl net.inet.ip.fw.enable=0 does not solve the problem. > > I have control over WAN switching/routing network and may assure it runs just > fine. > However, I can't guarantee it has no "soft" anomalies like short storms or > some silly broadcasts. > > I've tried to make incoming flood with ng_source(4) generated UDP flood at > 100M rate > for 60 seconds and failed to reproduce the problem artificially. > > I've tried to move WAN from vr1 to vr0 and the problem has moved to vr0 too. > My LAN has very little traffic and corresponding vr interface exhibits no > problems. > > This router also routinely runs transmission (torrent client from ports) > serving torrents from USB-attached HDD making severe CPU load, so I suspect > the problem may be related with CPU load. > > I've also checked mbuf/mbuf clusters usage and they are all right: > > # netstat -m > 1539/2076/3615 mbufs in use (current/cache/total) > 1200/1278/2478/65536 mbuf clusters in use (current/cache/total/max) > 1200/306 mbuf+clusters out of packet secondary zone in use (current/cache) > 318/181/499/12800 4k (page size) jumbo clusters in use > (current/cache/total/max) > 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) > 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) > 4056K/3799K/7855K bytes allocated to network (current/cache/total) > 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) > 0/0/0 requests for jumbo clusters denied (4k/9k/16k) > 0/4/6656 sfbufs in use (current/peak/max) > 0 requests for sfbufs denied > 0 requests for sfbufs delayed > 0 requests for I/O initiated by sendfile > 0 calls to protocol drain routines > > # vmstat -z | egrep -i 'ITEM|mbuf' > ITEM SIZE LIMIT USED FREE REQUESTS > FAILURES > mbuf_packet: 256, 0, 1429, 77, 112854470, > 0 > mbuf: 256, 0, 489, 1620, 369073316, > 0 > mbuf_cluster: 2048, 65536, 1506, 604, 5401864, > 0 > mbuf_jumbo_page: 4096, 12800, 469, 158, 8306777, > 0 > mbuf_jumbo_9k: 9216, 6400, 0, 0, 0, > 0 > mbuf_jumbo_16k: 16384, 3200, 0, 0, 0, > 0 > mbuf_ext_refcnt: 4, 0, 0, 0, 0, > 0 > NetGraph items: 36, 4130, 1, 117, 263123, > 0 > NetGraph data items: 36, 531, 0, 295, 106663377, > 0 > > While ifconfig vr1 down/up solves the problem completely (for some long time), > taking link down/up using switch solves it "in half" - huge packet delays > disappear > and turn to 25% packet loss happening in regular short intervals, once a > second of like. > > ifconfig down/up clears this mess too. > > Please help me to debug this, it's pretty annoying. By chance, did vr(4) spew some kind of diagnostics messages to console? If I remember correctly, vr(4) automatically restarts controller and show these errors when it detects abnormal condition. Abnormal conditions for vr(4) would be: - TX/RX MAC stuck - RX MAC stop due to FIFO overflow or no RX buffers - PCI bus errors - TX abort - TX underrun > I had a hope new vr(4) driver would help but it takes my system down under > average load > and is unusable. > > Here is start of dmesg.boot: > > Copyright (c) 1992-2012 The FreeBSD Project. > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 > The Regents of the University of California. All rights reserved. > FreeBSD is a registered trademark of The FreeBSD Foundation. > FreeBSD 8.3-STABLE #1: Wed Aug 29 22:49:45 NOVT 2012 > r...@grosbein.pp.ru:/usr/local/obj/nanobsd.gw/i386/usr/local/src/sys/GW > i386 > Timecounter "i8254" frequency 1193182 Hz quality 0 > CPU: Geode(TM) Integrated Processor by AMD PCS (499.91-MHz 586-class CPU) > Origin = "AuthenticAMD" Id = 0x5a2 Family = 5 Model = a Stepping = 2 > Features=0x88a93d<FPU,DE,PSE,TSC,MSR,CX8,SEP,PGE,CMOV,CLFLUSH,MMX> > AMD Features=0xc0400000<MMX+,3DNow!+,3DNow!> > real memory = 1065025536 (1015 MB) > avail memory = 1032929280 (985 MB) > K6-family MTRR support enabled (2 registers) > > I must also note that this system runs with ACPI disabled in > /boot/loader.conf: > hint.acpi.0.disabled=1 > > Otherwise, its timekeeping becomes broken. > > Eugene Gtosbein _______________________________________________ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"