Re: igb watchdog timeouts
Just the changes in sys/dev/e1000 required or are there any other dependencies? Regards Steve - Original Message - From: Jack Vogel To: Steven Hartland Cc: Robin Sommer ; freebsd-net Sent: Friday, July 30, 2010 4:47 AM Subject: Re: igb watchdog timeouts Try the code from STABLE/8 or HEAD if you would please, if you have questions of what or how let me know. Jack This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Packet loss when using multiple IP addresses
Hi freebsd-net, I'm trying to run a root server using FreeBSD using four different IP addresses. Everything works fine with one IP address, but if I add more addresses I notice a packet loss of about 10% after some minutes, in rare cases after three hours. Sometimes the packet loss raises to 50% and somtimes up to 95%. But never 100%. Incoming connections are affected. I cannot type in my ssh connection. Outgoing connections do not seem to be affected, mtr has 0.0% loss after thousands of packets. Running mtr on the host seems to "help" (*). There are no error messages in messages and console.log. The second IP address is used for a jail and the third IP address for a VM running VirtualBox. The final configuration is with native IPv6 (dual stack). I've tried all of the following without success: - FreeBSD 8.1-RELEASE and 8.1-STABLE (20100729) - re(4) and em(4) - with or without the changes to /sys/dev/re/if_re.c from r207977 - with or without jail (alias IP address) - with or without vbox - with or without IPv6 - with or without powerd - with ifconfig_${INTERFACE}="DHCP" and with static configuration - with or without rxcsum,txcsum - the motherboard was changed already (mainly because of problems with ahci enabled) If I use tcpdump in order to trace the ICMP packets on the ethernet interface, I do not see the incoming ICMP requests. The motherboard is a MSI-X58Pro-E. kernel is GENERIC. I do not use ipfw(8). I've recompiled virtualbox after switching from 8.1-RELEASE to 8.1-STABLE. Ports are updated daily. I have a very similar setup running in the office without problems. I'm very frustrated because I have absolutely no idea what's going on here. (*) I can repoduce reducing the packet loss a lot (to less than 1%) by running mtr from the host using the remote console. If I start the mtr, the packet loss goes down and I can login using ssh. If you have any idea what I can do in order to find the source of my problem, please answer. Maybe a single keyword is enough. :) Thanks, Knarf smime.p7s Description: S/MIME cryptographic signature
Re: igb watchdog timeouts
I believe so, let me verify that for sure on a system in our validation lab this morning, stay tuned Jack On Fri, Jul 30, 2010 at 1:32 AM, Steven Hartland wrote: > Just the changes in sys/dev/e1000 required or are there any other > dependencies? > > Regards > Steve > > - Original Message - > *From:* Jack Vogel > *To:* Steven Hartland > *Cc:* Robin Sommer ; freebsd-net > *Sent:* Friday, July 30, 2010 4:47 AM > *Subject:* Re: igb watchdog timeouts > > Try the code from STABLE/8 or HEAD if you would please, if you have > questions > of what or how let me know. > > Jack > > > > This e.mail is private and confidential between Multiplay (UK) Ltd. and the > person or entity to whom it is addressed. In the event of misdirection, the > recipient is prohibited from using, copying, printing or otherwise > disseminating it or any information contained in it. > > In the event of misdirection, illegible or incomplete transmission please > telephone +44 845 868 1337 > or return the E.mail to postmas...@multiplay.co.uk. > ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: igb watchdog timeouts
On Fri, Jul 30, 2010 at 08:35 +0300, Zeus V Panchenko wrote: > the same was for me untill i upgraded BIOS up to the latest one > available from the MB vendor site I'm going to try the driver from 8-STABLE, as suggested by Jack (thanks!), but for the record, I've already updated the BIOS and I'm still seeing the timeouts. Robin -- Robin Sommer * Phone +1 (510) 666-2886 * ro...@icir.org ICSI/LBNL* Fax +1 (510) 666-2956 * www.icir.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: Packet loss when using multiple IP addresses
my problem, please answer. Maybe a single keyword is enough. :) Check: Heat ? Voltages ? ( I've had a few bits of hardware die the last few weeks, it's been hot the last few weeks here in Munich where Frank & I are ) Electrolytic capacitors when hotter dry & degrade faster .. Just a guess / last straw to clutch at & check ? . Good luck. Cheers, Julian -- Julian Stacey: BSD Unix Linux C Sys Eng Consultants Munich http://berklix.com Mail plain text. Not HTML, Not quoted-printable, Not Base64. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: igb watchdog timeouts
Robin Sommer (ro...@icir.org) [10.07.30 18:38] wrote: > I'm going to try the driver from 8-STABLE, as suggested by Jack > (thanks!), but for the record, I've already updated the BIOS and I'm > still seeing the timeouts. just have CVS-ed to RELENG_8, recompiled the kernel and loaded the drivers em(4) and igb(4) - works! :) i was testing them with nc(1) server side: nc -u -l 5 > /dev/null client side: nc -u 5 < /dev/random but the maximum i was able to get was 500Mbit/s btw, is it correct to test it such way? -- Zeus V. Panchenko IT Dpt., IBS ltdGMT+2 (EET) ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: kern/149117: [inet] [patch] in_pcbbind: redundant test
Old Synopsis: in_pcbbind: redundant test New Synopsis: [inet] [patch] in_pcbbind: redundant test Responsible-Changed-From-To: freebsd-bugs->freebsd-net Responsible-Changed-By: linimon Responsible-Changed-When: Fri Jul 30 17:41:19 UTC 2010 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=149117 ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: igb watchdog timeouts
At 12:52 PM 7/30/2010, Zeus V Panchenko wrote: but the maximum i was able to get was 500Mbit/s btw, is it correct to test it such way? Try using the tools in /usr/src/tools/tools/netrate you can generate a lot more traffic this way. ---Mike -- Zeus V. Panchenko IT Dpt., IBS ltdGMT+2 (EET) ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" Mike Tancsa, tel +1 519 651 3400 Sentex Communications,m...@sentex.net Providing Internet since 1994www.sentex.net Cambridge, Ontario Canada www.sentex.net/mike ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: igb watchdog timeouts
Mike Tancsa (m...@sentex.net) [10.07.30 20:25] wrote: > > Try using the tools in /usr/src/tools/tools/netrate > > you can generate a lot more traffic this way. > thank you Mike, it works! :) netsend 10.11.0.2 5 1000 20 60 Sending packet of payload size 1000 every 0.05000s for 60 seconds start: 1280511835.0 finish:1280511895.14942 send calls:1190 send errors: 0 approx send rate: 19 approx error rate: 0 waited:13557673 approx waits/sec: 225961 approx wait rate: 1 -- Zeus V. Panchenko IT Dpt., IBS ltdGMT+2 (EET) ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Kernel (7.3) crash due to mbuf leak?
After upgrading a couple of our systems from 7.2-RELEASE to 7.3-RELEASE, we have started to see them running out of mbuf's and crashing every month or so. The panic string is: kmem_malloc(16384): kmem_map too small: 335233024 total allocated The actual panic signature (backtrace) shows a memory allocation failure occurring in the filesystem code, but I do not think that is where the problem lies. Instead, it is clear to me that the system is slowly leaking mbuf's until there is no more kernel memory available, and the filesystem is just the innocent bystander asking for memory and failing to get it. Here's some netstat -m output on a couple of crashes: fs0# netstat -m -M vmcore.0 882167/2902/885069 mbufs in use (current/cache/total) 351/2041/2392/25600 mbuf clusters in use (current/cache/total/max) 351/1569 mbuf+clusters out of packet secondary zone in use (current/cache) 0/199/199/12800 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/19200 9k jumbo clusters in use (current/cache/total/max) 0/0/0/12800 16k jumbo clusters in use (current/cache/total/max) 221249K/5603K/226853K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile 0 calls to protocol drain routines fs0# netstat -m -M vmcore.1 894317/2905/897222 mbufs in use (current/cache/total) 345/2013/2358/25600 mbuf clusters in use (current/cache/total/max) 350/1358 mbuf+clusters out of packet secondary zone in use (current/cache) 0/263/263/12800 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/19200 9k jumbo clusters in use (current/cache/total/max) 0/0/0/12800 16k jumbo clusters in use (current/cache/total/max) 224274K/5804K/230078K bytes allocated to network (current/cache/total) 0/1/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile 0 calls to protocol drain routines fs1# netstat -m -M vmcore.0 857844/2890/860734 mbufs in use (current/cache/total) 317/2139/2456/25600 mbuf clusters in use (current/cache/total/max) 350/1603 mbuf+clusters out of packet secondary zone in use (current/cache) 0/263/263/12800 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/19200 9k jumbo clusters in use (current/cache/total/max) 0/0/0/12800 16k jumbo clusters in use (current/cache/total/max) 215098K/6052K/221151K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile 0 calls to protocol drain routines I also note that my currently running systems are both well on their way to crashing again: fs0# netstat -m 766618/2927/769545 mbufs in use (current/cache/total) 276/2560/2836/25600 mbuf clusters in use (current/cache/total/max) 276/1772 mbuf+clusters out of packet secondary zone in use (current/cache) 0/550/550/12800 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) 192207K/8051K/200259K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0/7/6656 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile 0 calls to protocol drain routines fs0# uptime 1:00PM up 18 days, 13:52, 1 user, load averages: 0.00, 0.00, 0.00 fs1# netstat -m 126949/3356/130305 mbufs in use (current/cache/total) 263/1917/2180/25600 mbuf clusters in use (current/cache/total/max) 263/1785 mbuf+clusters out of packet secondary zone in use (current/cache) 0/295/295/12800 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) 32263K/5853K/38116K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0/7/6656 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile 0 calls to protocol drain routines fs1# uptime 1:00PM
Re: Kernel (7.3) crash due to mbuf leak?
On 07/30/10 14:10, David DeSimone wrote: After upgrading a couple of our systems from 7.2-RELEASE to 7.3-RELEASE, we have started to see them running out of mbuf's and crashing every month or so. The panic string is: ... The services on these systems are extremely simple: SSH (though nobody logs in) sendmail qmail ntpd (client only) named (BIND) Do these systems consume or offer NFS? -Steve ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: Kernel (7.3) crash due to mbuf leak?
Steve Polyack wrote: > > On 07/30/10 14:10, David DeSimone wrote: > >After upgrading a couple of our systems from 7.2-RELEASE to 7.3-RELEASE, > >we have started to see them running out of mbuf's and crashing every > >month or so. The panic string is: > ... > > >The services on these systems are extremely simple: > > > > SSH (though nobody logs in) > > sendmail > > qmail > > ntpd (client only) > > named (BIND) > > > > Do these systems consume or offer NFS? No NFS in use here. -- David DeSimone == Network Admin == f...@verio.net "I don't like spinach, and I'm glad I don't, because if I liked it I'd eat it, and I just hate it." -- Clarence Darrow This email message is intended for the use of the person to whom it has been sent, and may contain information that is confidential or legally protected. If you are not the intended recipient or have received this message in error, you are not authorized to copy, distribute, or otherwise use this message or its attachments. Please notify the sender immediately by return e-mail and permanently delete this message and any attachments. Verio, Inc. makes no warranty that this email is error or virus free. Thank you. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
AltQ throughput issues (long message)
All, I am looking for (again) some understanding of AltQ and how it works w.r.t. packet through put. I posted earlier this month regarding how to initially configure AltQ (thanks to everyone's help) and now have it working over the em(4) drive on a FreeBSD 8.0 platform (HP DL350 G5). I had to bring the em(4) driver from the 8-Stable branch, but it is working just fine so far (needed to add the drbr_needs_enqueue() to if_var.h). I have now gone back to trying to setup up one queue with a bandwith of 1900 Kbs (1.9 Mbs). I ran a test with 'iperf' using udp and setting the bandwidth to 25 Mbs. I then ran a test setting the queue bandwith to 20 Mbs and running 'iperf' again using udp and 25 Mbs bandwith. In both cases, the throughput only seems to be 89% of the requested throughput. Test 1 AltQ queue bandwidth 1.9 Mbs, iperf -b 25M pfctl -vv -s queue reported: queue root_em0 on em0 bandwidth 1Gb priority 0 cbq( wrr root ) {test7788} [ pkts: 28298 bytes: 42771988 dropped pkts: 0 bytes: 0 ] [ qlength: 0/ 50 borrows: 0 suspends: 0 ] [ measured: 140.8 packets/s, 1.70Mb/s ] queue test7788 on em0 bandwidth 1.90Mb cbq( default ) [ pkts: 28298 bytes: 42771988 dropped pkts: 397077 bytes: 600380424 ] [ qlength: 50/ 50 borrows: 0 suspends: 3278 ] [ measured: 140.8 packets/s, 1.70Mb/s ] iperf reported [ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams [ 3] 0.0-200.4 sec 39.7 MBytes 1.66 Mbits/sec 6.998 ms 397190/425533 (93%) Test 2 AltQ queue bandwidth 20 Mbs, iperf -b 25M pfctl -vv -s queue reported: queue root_em0 on em0 bandwidth 1Gb priority 0 cbq( wrr root ) {test7788} [ pkts: 356702 bytes: 539329126 dropped pkts: 0 bytes: 0 ] [ qlength: 0/ 50 borrows: 0 suspends: 0 ] [ measured: 1500.2 packets/s, 18.15Mb/s ] queue test7788 on em0 bandwidth 20Mb cbq( default ) [ pkts: 356702 bytes: 539329126 dropped pkts: 149198 bytes: 225587376 ] [ qlength: 46/ 50 borrows: 0 suspends: 39629 ] [ measured: 1500.2 packets/s, 18.15Mb/s ] iperf reported [ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams [ 3] 0.0-240.0 sec505 MBytes 17.6 Mbits/sec 0.918 ms 150584/510637 (29%) Why can AltQ not drive it at full bandwidth? This is just some preliminary testing, but I want to scale this up to use all available AltQ CBQ queues for various operations. As always, my knowledge is always increased with I ask questions on this list. Thanks, Patrick === Test Results === Network topology: +--+ +--+ | | | | | NPX8 | |NPX3 | | (em1)+ = +(em3) | | | | | | | |(em0) | +--+ +--+---+ I I I I I I I I +--+---+ |(em0) | | | |NPX6 | | | | | +--+ NPX8: em1: 172.16.38.80/24 em1: flags=8843 metric 0 mtu 1500 options=19b ether 00:1c:c4:48:93:10 inet 172.16.38.80 netmask 0xff00 broadcast 172.16.38.255 media: Ethernet autoselect (1000baseT ) status: active NPX3: em3: 172.16.38.30/24 em3: flags=8843 metric 0 mtu 1500 options=19b ether 00:1f:29:5f:c6:aa inet 172.16.38.30 netmask 0xff00 broadcast 172.16.38.255 media: Ethernet autoselect (1000baseT ) status: active em0: 172.16.13.30/24 em0: flags=8843 metric 0 mtu 1500 options=19b ether 00:1f:29:5f:c6:a9 inet 172.16.13.30 netmask 0xff00 broadcast 172.16.13.255 media: Ethernet autoselect (1000baseT ) status: active NPX6: em0: 172.16.13.60/24 em0: flags=8843 metric 0 mtu 1500 options=19b ether 00:1c:c4:48:95:d1 inet 172.16.13.60 netmask 0xff00 broadcast 172.16.13.255 media: Ethernet autoselect (1000baseT ) status: active NPX8 IPv4 Routing table npx8# netstat -nr Routing tables Internet: DestinationGatewayFlagsRefs Use Ne
Re: AltQ throughput issues (long message)
On Fri, Jul 30, 2010 at 04:07:04PM -0700, Patrick Mahan wrote: > All, > > I am looking for (again) some understanding of AltQ and how it works > w.r.t. packet through put. I posted earlier this month regarding how > to initially configure AltQ (thanks to everyone's help) and now have > it working over the em(4) drive on a FreeBSD 8.0 platform (HP DL350 G5). > > I had to bring the em(4) driver from the 8-Stable branch, but it is > working just fine so far (needed to add the drbr_needs_enqueue() to > if_var.h). > > I have now gone back to trying to setup up one queue with a bandwith > of 1900 Kbs (1.9 Mbs). I ran a test with 'iperf' using udp and setting > the bandwidth to 25 Mbs. I then ran a test setting the queue bandwith > to 20 Mbs and running 'iperf' again using udp and 25 Mbs bandwith. > > In both cases, the throughput only seems to be 89% of the requested > throughput. part of it can be explained because AltQ counts the whole packet (eg. 1514 bytes for a full frame) whereas iperf only considers the UDP payload (e.g. 1470 bytes in your case). The other thing you should check is whether there is any extra traffic going through the interface that competes for the bottleneck bandwidth. You have such huge drop rates in your tests that i would not be surprised if you had ICMP packets going around trying to slow down the sender. BTW have you tried dummynet in your config? cheers luigi > Test 1 > AltQ queue bandwidth 1.9 Mbs, iperf -b 25M > pfctl -vv -s queue reported: > > queue root_em0 on em0 bandwidth 1Gb priority 0 cbq( wrr root ) {test7788} > [ pkts: 28298 bytes: 42771988 dropped pkts: 0 bytes: 0 > ] > [ qlength: 0/ 50 borrows: 0 suspends: 0 ] > [ measured: 140.8 packets/s, 1.70Mb/s ] > queue test7788 on em0 bandwidth 1.90Mb cbq( default ) > [ pkts: 28298 bytes: 42771988 dropped pkts: 397077 bytes: > 600380424 ] > [ qlength: 50/ 50 borrows: 0 suspends: 3278 ] > [ measured: 140.8 packets/s, 1.70Mb/s ] > > iperf reported > > [ ID] Interval Transfer Bandwidth Jitter Lost/Total > Datagrams > [ 3] 0.0-200.4 sec 39.7 MBytes 1.66 Mbits/sec 6.998 ms 397190/425533 > (93%) > > Test 2 > AltQ queue bandwidth 20 Mbs, iperf -b 25M > pfctl -vv -s queue reported: > > queue root_em0 on em0 bandwidth 1Gb priority 0 cbq( wrr root ) {test7788} > [ pkts: 356702 bytes: 539329126 dropped pkts: 0 bytes: 0 > ] > [ qlength: 0/ 50 borrows: 0 suspends: 0 ] > [ measured: 1500.2 packets/s, 18.15Mb/s ] > queue test7788 on em0 bandwidth 20Mb cbq( default ) > [ pkts: 356702 bytes: 539329126 dropped pkts: 149198 bytes: > 225587376 ] > [ qlength: 46/ 50 borrows: 0 suspends: 39629 ] > [ measured: 1500.2 packets/s, 18.15Mb/s ] > > iperf reported > > [ ID] Interval Transfer Bandwidth Jitter Lost/Total > Datagrams > [ 3] 0.0-240.0 sec505 MBytes 17.6 Mbits/sec 0.918 ms 150584/510637 > (29%) > > Why can AltQ not drive it at full bandwidth? This is just some preliminary > testing, > but I want to scale this up to use all available AltQ CBQ queues for > various operations. > > As always, my knowledge is always increased with I ask questions on > this list. > > Thanks, > > Patrick > > === Test Results === > Network topology: > > > +--+ +--+ > | | | | > | NPX8 | |NPX3 | > | (em1)+ = +(em3) | > | | | | > | | |(em0) | > +--+ +--+---+ > I > I > I > I > I > I > I > I >+--+---+ >|(em0) | >| | >|NPX6 | >| | >| | >+--+ > > NPX8: > > em1: 172.16.38.80/24 > > em1: flags=8843 metric 0 mtu 1500 > options=19b > ether 00:1c:c4:48:93:10 > inet 172.16.38.80 netmask 0xff00 broadcast 172.16.38.255 > media: Ethernet autoselect (1000baseT ) > status: active > > > NPX3: > > em3: 172.16.38.30/24 > > em3: flags=8843