Hi List, I¹m wondering if anyone can help me with this problem or at least help point me in the direction of where to start looking? I have FreeBSD 9 based servers which are crashing every 4-10 days and producing crash dumps similar to this one: http://pastebin.com/F82Jc08C
All crash dumps seem to involve the net graph code and the current process is always ng_queueX. In summary, we have 4 x FreeBSD server running as LNS(MPD5) for around 2000 subscribers with 3 of the servers running a modified version of BSDRP, the fourth running a FreeBSD 9 install with what I thought was the latest stable source for the kernel because I fetched it from stable/9 however it shows up as 9.3-BETA in uname(the linked crash dump is from that server). 3 x LNS running modified BSDRP: DELL PowerEdge 2950, 2 x Xeon E5320, 4GB RAM, igb Quad Port NIC in LAGG, Quagga, MPD5, IPFW for Host Access Control, NTPD, BSNMPD 1 x LNS running latest FreeBSD 9 code: HP ProLiant DL380, 2 x Xeon X5465, 36GB RAM, em Quad Port NIC in LAGG, BIRD, MPD5, IPFW for Host Access Control, NTPD, BSNMPD The reason I built the fresh server on FreeBSD 9 is because I cannot save crash dumps for BSDRP easily. In short the problem is this servers with 10-50 clients will run indefinitely(as long as we have had them, which is probably about 1.5 years) without errors and serve clients fine, however any with over 300 clients appear to only stay online for 4-10 days maximum before crashing and rebooting. I have attached the crash file from the latest crash on the LNS running the latest FreeBSD 9 code however unsure what to do with it and where to look? When these devices crash they are often doing in excess of 200Mbps(anywhere between 200Mbps and 450Mbps), very little load(3-4.5 on the first 3, less than 2 on the fourth). Things I¹ve done to attempt resolution: - Replaced bce network cards with em network cards. This produced far less errors on the interfaces(was many before, now none) and I think caused the machines to stay up longer between reboots as before it would happen up to once a day. - Replaced em network cards with igb network cards. All this did was lower load and give us a little more time between reboots. - Tried an implementation using FreeBSD 10(this lasted less than 4 hours before reboots when under load) - Replaced memory - Increased memory on LNS4 to 36GB. - Various kernel rebuilds - Tweaked various kernel settings. This appears to have helped a little and given us more time between reboots. - Disabled IPv6 - Disabled IPFW - Disabled BSNMPD - Disabled Netflow - Versions 5.6 and 5.7 of MPD5 Anyone able to help me work out what the crash dump means? It only happens on servers running MPD5 (eg. Exact same boxes, exact same code pushing 800Mbps+ of routing and no crashes) and I can see the crash relates to net graph, however unsure where to go from there Thanks, Mark Relevant Current Settings: net.inet.ip.fastforwarding=1 net.inet.ip.fw.default_to_accept=1 net.bpf.zerocopy_enable=1 net.inet.raw.maxdgram=16384 net.inet.raw.recvspace=16384 hw.intr_storm_threshold=64000 net.inet.ip.fastforwarding=1 net.inet.ip.fw.default_to_accept=1 net.inet.ip.intr_queue_maxlen=10240 net.inet.ip.redirect=0 net.inet.ip.sourceroute=0 net.inet.ip.rtexpire=2 net.inet.ip.rtminexpire=2 net.inet.ip.rtmaxcache=256 net.inet.ip.accept_sourceroute=0 net.inet.ip.process_options=0 net.inet.icmp.log_redirect=0 net.inet.icmp.drop_redirect=1 net.inet.tcp.drop_synfin=1 net.inet.tcp.blackhole=2 net.inet.tcp.sendbuf_max=16777216 net.inet.tcp.recvbuf_max=16777216 net.inet.tcp.sendbuf_auto=1 net.inet.tcp.recvbuf_auto=1 net.inet.udp.recvspace=262144 net.inet.udp.blackhole=0 net.inet.udp.maxdgram=57344 net.route.netisr_maxqlen=4096 net.local.stream.recvspace=65536 net.local.stream.sendspace=65536 net.graph.maxdata=65536 net.graph.maxalloc=65536 net.graph.maxdgram=2096000 net.graph.recvspace=2096000 kern.ipc.somaxconn=32768 kern.ipc.nmbclusters=524288 kern.ipc.maxsockbuf=26214400 kern.ipc.shmmax=³2147483648" kern.ipc.nmbjumbop=³53200" kern.ipc.maxpipekva=³536870912" kern.random.sys.harvest.ethernet="0" kern.random.sys.harvest.interrupt="0" vm.kmem_size=³4096M² # Only on box with over 12G RAM. Otherwise 2G. vm.kmem_size_max=³8192M" # Only on box with over 12G RAM. hw.igb.rxd="4096" hw.igb.txd="4096" hw.em.rxd="4096" hw.em.txd="4096" hw.igb.max_interrupt_rate=³32000" hw.igb.rx_process_limit="4096" hw.em.rx_process_limit="500" net.link.ifqmaxlen="20480" net.isr.dispatch="direct" net.isr.direct_force="1" net.isr.direct="1" net.isr.maxthreads="8" net.isr.numthreads="4" net.isr.bindthreads="1" net.isr.maxqlimit="20480" net.isr.defaultqlimit="8192" _______________________________________________ freebsd-bugs@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"