Current problem reports assigned to freebsd-net@FreeBSD.org
Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description o kern/158726 net[ip6] [patch] ICMPv6 Router Announcement flooding limi o kern/158694 net[ix] [lagg] ix0 is not working within lagg(4) o kern/158665 net[ip6] [panic] kernel pagefault in in6_setscope() o kern/158635 net[em] TSO breaks BPF packet captures with em driver f kern/158426 net[e1000] [panic] _mtx_lock_sleep: recursed on non-recur o kern/158156 net[bce] bce driver shows "no carrier" on IBM blade (HS22 f kern/157802 net[dummynet] [panic] kernel panic in dummynet o kern/157785 netamd64 + jail + ipfw + natd = very slow outbound traffi o kern/157429 net[re] Realtek RTL8169 doesn't work with re(4) o kern/157418 net[em] em driver lockup during boot on Supermicro X9SCM- o kern/157410 net[ip6] IPv6 Router Advertisements Cause Excessive CPU U o kern/157287 net[re] [panic] INVARIANTS panic (Memory modified after f o kern/157209 net[ip6] [patch] locking error in rip6_input() (sys/netin o kern/157200 net[network.subr] [patch] stf(4) can not communicate betw o kern/157182 net[lagg] lagg interface not working together with epair o kern/156978 net[lagg][patch] Take lagg rlock before checking flags o kern/156877 net[dummynet] [panic] dummynet move_pkt() null ptr derefe o kern/156667 net[em] em0 fails to init on CURRENT after March 17 o kern/156408 net[vlan] Routing failure when using VLANs vs. Physical e o kern/156328 net[icmp]: host can ping other subnet but no have IP from o kern/156317 net[ip6] Wrong order of IPv6 NS DAD/MLD Report o kern/156283 net[ip6] [patch] nd6_ns_input - rtalloc_mpath does not re o kern/156279 net[if_bridge][divert][ipfw] unable to correctly re-injec o kern/156226 net[lagg]: failover does not announce the failover to swi o kern/156030 net[ip6] [panic] Crash in nd6_dad_start() due to null ptr o kern/155772 netifconfig(8): ioctl (SIOCAIFADDR): File exists on direc o kern/155680 net[multicast] problems with multicast s kern/155642 net[request] Add driver for Realtek RTL8191SE/RTL8192SE W o kern/155604 net[flowtable] Flowtable excessively caches dest MAC addr o kern/155597 net[panic] Kernel panics with "sbdrop" message o kern/155585 net[tcp] [panic] tcp_output tcp_mtudisc loop until kernel o kern/155498 net[ral] ral(4) needs to be resynced with OpenBSD's to ga o kern/155420 net[vlan] adding vlan break existent vlan o bin/155365 net[patch] routed(8): if.c in routed fails to compile if o kern/155177 net[route] [panic] Panic when inject routes in kernel o kern/155030 net[igb] igb(4) DEVICE_POLLING does not work with carp(4) o kern/155010 net[msk] ntfs-3g via iscsi using msk driver cause kernel o kern/155004 net[bce] [panic] kernel panic in bce0 driver o kern/154943 net[gif] ifconfig gifX create on existing gifX clears IP s kern/154851 net[request]: Port brcm80211 driver from Linux to FreeBSD o kern/154850 net[netgraph] [patch] ng_ether fails to name nodes when t p kern/154831 net[arp] [patch] arp sysctl setting log_arp_permanent_mod o kern/154679 net[em] Fatal trap 12: "em1 taskq" only at startup (8.1-R o kern/154600 net[tcp] [panic] Random kernel panics on tcp_output o kern/154557 net[tcp] Freeze tcp-session of the clients, if in the gat o kern/154443 net[if_bridge] Kernel module bridgestp.ko missing after u o kern/154286 net[netgraph] [panic] 8.2-PRERELEASE panic in netgraph o kern/154255 net[nfs] NFS not responding o kern/154214 net[stf] [panic] Panic when creating stf interface o kern/154185 netrace condition in mb_dupcl o kern/154169 net[multicast] [ip6] Node Information Query multicast add o kern/154134 net[ip6] stuck kernel state in LISTEN on ipv6 daemon whic o kern/154091 net[netgraph] [panic] netgraph, unaligned mbuf? o conf/154062 net[vlan] [patch] change to way of auto-generatation of v o kern/153937 net[ral] ralink panics the system (amd64 freeBSDD 8.X) wh o kern/153936 net[ixgbe] [patch] MPRC workaround incorrectly applied to o kern/153816 net[ixgbe] ixgbe doesn't work properly with the Intel 10g o kern/153772 net[ixgbe] [patch] sysctls reference wrong XON/XOFF varia o kern/153497 net[netgraph] netgraph panic due to race conditions o kern/153454 net[p
Repeating kernel panic within dummynet
Hi! My FreeBSD 8.2/amd64 routers use dummynet heavily and keep panic with the *same* KDB backtrace: dummynet: bad switch -256! Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x0 fault code = supervisor read instruction, page not present instruction pointer = 0x20:0x0 stack pointer = 0x28:0xff81229d9a10 frame pointer = 0x28:0xff81229d9a40 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 0 (dummynet) trap number = 12 panic: page fault cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at 0x801aaaca = db_trace_self_wrapper+0x2a kdb_backtrace() at 0x80329667 = kdb_backtrace+0x37 panic() at 0x802f6cb7 = panic+0x187 trap_fatal() at 0x804d8b50 = trap_fatal+0x290 trap_pfault() at 0x804d8f2f = trap_pfault+0x28f trap() at 0x804d940f = trap+0x3df calltrap() at 0x804c0b44 = calltrap+0x8 --- trap 0xc, rip = 0, rsp = 0xff81229d9a10, rbp = 0xff81229d9a40 --- uart_z8530_class() at 0 mb_dtor_pack() at 0x802e4787 = mb_dtor_pack+0x37 uma_zfree_arg() at 0x8049ba5a = uma_zfree_arg+0x3a m_freem() at 0x803556a7 = m_freem+0x37 dummynet_send() at 0x803e909d = dummynet_send+0x2d dummynet_task() at 0x803e93c6 = dummynet_task+0x1c6 taskqueue_run_locked() at 0x80335a65 = taskqueue_run_locked+0x85 taskqueue_thread_loop() at 0x80335bfe = taskqueue_thread_loop+0x4e fork_exit() at 0x802ca4bf = fork_exit+0x11f fork_trampoline() at 0x804c108e = fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xff81229d9d00, rbp = 0 --- Uptime: 2d5h17m39s Dumping 4087 MB (4 chunks) chunk 0: 1MB (150 pages) ... ok chunk 1: 3575MB (915072 pages) 3559 3543 3527 3511 3495 3479 It does not finish writing dump and hangs until IPMI watchdog reboots the box. I've tried to use debug.minidump=1 but it still hangs while crashdumps is generating and stops responding to Ctrl-Alt-ESC meantime. Sadly, I cannot add options INVARIANTS to the kernel because it makes my mpd-based routers to panic very often (every 2-3 hours) due to famous 'dangling pointer' problem - PPPoE user disconnects, its ngXXX interface got removed, then its traffic goes out various system queues (netisr, dummynet etc.) and another kind of panic occurs due to INVARIANTS' references to non-existent ifp. Please help. Eugene Grosbein ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: Repeating kernel panic within dummynet
11.07.2011 18:45, Vlad Galu пишет: > > On Jul 11, 2011, at 1:42 PM, Eugene Grosbein wrote: > >> Hi! >> >> My FreeBSD 8.2/amd64 routers use dummynet heavily >> and keep panic with the *same* KDB backtrace: >> >> dummynet: bad switch -256! Forgot to mention that I use io_fast dummynet mode and have increased pipe lengths: net.inet.ip.dummynet.pipe_slot_limit=1000 net.inet.ip.dummynet.io_fast=1 Distinct pipes do really use long lengths. >> Sadly, I cannot add options INVARIANTS to the kernel because it makes my >> mpd-based >> routers to panic very often (every 2-3 hours) due to famous 'dangling >> pointer' >> problem - PPPoE user disconnects, its ngXXX interface got removed, then its >> traffic >> goes out various system queues (netisr, dummynet etc.) and another kind of >> panic >> occurs due to INVARIANTS' references to non-existent ifp. > > Hi Eugene, > If your ISR threads aren't already bound to CPUs, you can bind them and try > using INVARIANTS. Please explain how to bind them. I have 4-core boxes with 4 NICs grouped to 2 laggs, one lagg(4) for uplink and another one for downlink. Eugene Grosbein ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: Repeating kernel panic within dummynet
On Jul 11, 2011, at 1:51 PM, Eugene Grosbein wrote: > 11.07.2011 18:45, Vlad Galu пишет: >> >> On Jul 11, 2011, at 1:42 PM, Eugene Grosbein wrote: >> >>> Hi! >>> >>> My FreeBSD 8.2/amd64 routers use dummynet heavily >>> and keep panic with the *same* KDB backtrace: >>> >>> dummynet: bad switch -256! > > Forgot to mention that I use io_fast dummynet mode > and have increased pipe lengths: > > net.inet.ip.dummynet.pipe_slot_limit=1000 > net.inet.ip.dummynet.io_fast=1 > > Distinct pipes do really use long lengths. > >>> Sadly, I cannot add options INVARIANTS to the kernel because it makes my >>> mpd-based >>> routers to panic very often (every 2-3 hours) due to famous 'dangling >>> pointer' >>> problem - PPPoE user disconnects, its ngXXX interface got removed, then its >>> traffic >>> goes out various system queues (netisr, dummynet etc.) and another kind of >>> panic >>> occurs due to INVARIANTS' references to non-existent ifp. >> >> Hi Eugene, >> If your ISR threads aren't already bound to CPUs, you can bind them and try >> using INVARIANTS. > > Please explain how to bind them. I have 4-core boxes with 4 NICs grouped to 2 > laggs, > one lagg(4) for uplink and another one for downlink. > net.isr.bindthreads=1 I'm not sure how and if that would help your particular setup, but it did so in Adrian Minta's recent netgraph/mpd experiments. According to an off-list chat I had with him, the machine would panic unless the ISRs were bound. > Eugene Grosbein ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: Repeating kernel panic within dummynet
11.07.2011 19:02, Vlad Galu пишет: > net.isr.bindthreads=1 > > I'm not sure how and if that would help your particular setup, but it did so > in Adrian Minta's recent netgraph/mpd experiments. According to an off-list > chat I had with him, the machine would panic unless the ISRs were bound. I disable ISR parallelism for my mpd routers using: net.isr.direct=1 net.isr.direct_force=1 At the other hand, there are other queues where traffic got delayed, not ISR only. Dummynet itself is an example. The router still panices with INVARIANTS too often. Eugene Grosbein ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: Repeating kernel panic within dummynet
On Jul 11, 2011, at 1:42 PM, Eugene Grosbein wrote: > Hi! > > My FreeBSD 8.2/amd64 routers use dummynet heavily > and keep panic with the *same* KDB backtrace: > > dummynet: bad switch -256! > > > Fatal trap 12: page fault while in kernel mode > cpuid = 0; apic id = 00 > fault virtual address = 0x0 > fault code = supervisor read instruction, page not present > instruction pointer = 0x20:0x0 > stack pointer = 0x28:0xff81229d9a10 > frame pointer = 0x28:0xff81229d9a40 > code segment= base 0x0, limit 0xf, type 0x1b >= DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags= interrupt enabled, resume, IOPL = 0 > current process = 0 (dummynet) > trap number = 12 > panic: page fault > cpuid = 0 > KDB: stack backtrace: > db_trace_self_wrapper() at 0x801aaaca = db_trace_self_wrapper+0x2a > kdb_backtrace() at 0x80329667 = kdb_backtrace+0x37 > panic() at 0x802f6cb7 = panic+0x187 > trap_fatal() at 0x804d8b50 = trap_fatal+0x290 > trap_pfault() at 0x804d8f2f = trap_pfault+0x28f > trap() at 0x804d940f = trap+0x3df > calltrap() at 0x804c0b44 = calltrap+0x8 > --- trap 0xc, rip = 0, rsp = 0xff81229d9a10, rbp = 0xff81229d9a40 --- > uart_z8530_class() at 0 > mb_dtor_pack() at 0x802e4787 = mb_dtor_pack+0x37 > uma_zfree_arg() at 0x8049ba5a = uma_zfree_arg+0x3a > m_freem() at 0x803556a7 = m_freem+0x37 > dummynet_send() at 0x803e909d = dummynet_send+0x2d > dummynet_task() at 0x803e93c6 = dummynet_task+0x1c6 > taskqueue_run_locked() at 0x80335a65 = taskqueue_run_locked+0x85 > taskqueue_thread_loop() at 0x80335bfe = taskqueue_thread_loop+0x4e > fork_exit() at 0x802ca4bf = fork_exit+0x11f > fork_trampoline() at 0x804c108e = fork_trampoline+0xe > --- trap 0, rip = 0, rsp = 0xff81229d9d00, rbp = 0 --- > Uptime: 2d5h17m39s > Dumping 4087 MB (4 chunks) > chunk 0: 1MB (150 pages) ... ok > chunk 1: 3575MB (915072 pages) 3559 3543 3527 3511 3495 3479 > > > It does not finish writing dump and hangs until IPMI watchdog reboots the box. > I've tried to use debug.minidump=1 but it still hangs while crashdumps is > generating > and stops responding to Ctrl-Alt-ESC meantime. > > Sadly, I cannot add options INVARIANTS to the kernel because it makes my > mpd-based > routers to panic very often (every 2-3 hours) due to famous 'dangling pointer' > problem - PPPoE user disconnects, its ngXXX interface got removed, then its > traffic > goes out various system queues (netisr, dummynet etc.) and another kind of > panic > occurs due to INVARIANTS' references to non-existent ifp. Hi Eugene, If your ISR threads aren't already bound to CPUs, you can bind them and try using INVARIANTS. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: kern/152036: [libc] getifaddrs(3) returns truncated sockaddrs for netmasks
The following reply was made to PR kern/152036; it has been noted by GNATS. From: Sergey Kandaurov To: bug-follo...@freebsd.org, kby...@gmail.com Cc: Subject: Re: kern/152036: [libc] getifaddrs(3) returns truncated sockaddrs for netmasks Date: Mon, 11 Jul 2011 17:59:47 +0400 [Some thoughts and testing...] This is rather a kernel bug, i.e. this is not a getifaddrs() bug. This is confirmed by (undocumented) ioctl SIOCGIFNETMASK. I found that the bug is manifested for ip4, and not for lladdr, ipv6. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: RFC 6296 (NPT v6)
10.07.2011 7:13, Rémy Sanchez wrote: Hi, I was wondering if they were anyone currently implementing NPTv6 for FreeBSD ? If nobody is, since I need this feature and that the RFC is quite simple, I think I'll implement it (or run out of time trying to). However, it looks like you can't divert IPv6, and then I don't know what would be the best option to IPv6 patch for divert(4) was committed in HEAD a couple weeks ago by glebius@ (r223593). ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
MFC Re: soreceive_stream: issues with O_NONBLOCK
On Jul 8, 2011, at 6:51 AM, Andre Oppermann wrote: > On 07.07.2011 21:24, Mikolaj Golub wrote: >> >> On Thu, 07 Jul 2011 12:47:15 +0200 Andre Oppermann wrote: >> >> AO> Please try this patch: >> AO> http://people.freebsd.org/~andre/soreceive_stream.diff-20110707 >> >> It works for me. No issues detected so far. Thanks. > > Committed in r223863. Many thanks for testing! > > -- > Andre Hello Andre, It appears that r197236 was never MFC'd, so soreceive_stream is still on by default in stable/8. Would you be able to MFC it along with 223839 and 223863? Thank you, Andrew -- Andrew Boyerabo...@averesystems.com ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
MFC of 218627 (SO_SETFIB 0)
Would someone please MFC r218627 back to stable/8 and stable/7? They are both affected. Thank you, Andrew -- Andrew Boyerabo...@averesystems.com ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
ESP Raw Socket: Returned IP packet incorrect
Hello all; I have recently encountered a problem when using raw sockets on FreeBSD 8 (8.0-RELEASE) when using ESP raw sockets. I have created a raw esp socket using: socket(AF_INET, SOCK_RAW, 50); which works fine. However, when there is a packet on the socket, recvfrom() returns a packet where the length bytes in the IP header are incorrect; they are swapped (MSB is placed in the LSB and vice-versa) tcpdump shows the following: tcpdump: listening on le0, link-type EN10MB (Ethernet), capture size 96 bytes 15:00:53.993810 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto ESP (50), length 120) 10.0.251.228 > 10.0.252.231: ESP(spi=0xa0534f17,seq=0x3), length 100 0x: 4500 0078 4000 4032 2d88 0a00 fbe4 0x0010: 0a00 fce7 a053 4f17 0003 6885 8abd 0x0020: 5ded 44dc 842f 3081 8fa3 bde4 2265 0x0030: 7438 2bf4 049c 664b 7dc4 44ef 1f6f 5e7d 0x0040: b8c1 482f 8c3b f488 a19a 3d9a d5fe ed9d 0x0050: b1c2 However, recvfrom() returns the following buffer: 4500 6400 0040 4032 2D88 0A00 FBE4 0A00 FCE7 A053 4F17 0003 6885 8ABD 5DED 44DC 842F 3081 8FA3 BDE4 2265 7438 2BF4 049C 664B 7DC4 44EF 1F6F 5E7D B8C1 482F 8C3B F488 A19A 3D9A D5FE ED9D B1C2 As it is easy to see, the length is not correct (bytes 2 and 3 are 0x6400 instead of 0x0064) and it does not correspond to the value returned by recvfrom(). Is this a known issue? Am I missing some options for raw sockets that are required for FreeBSD? I have attempted this on a socket to a TUN interface (not with an ESP socket) and the buffer had the proper length; it seems to only happen with ESP. This code runs fine on multiple Linux distributions and on Windows; it was only noticed with FreeBSD. Could it be that there is some other ESP application running and interfering (I have not installed any; don't know if there are by default and I'm quite new to any of the BSDs)? Any help would be much appreciated. Matt ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: ESP Raw Socket: Returned IP packet incorrect
On Jul 11, 2011, at 5:26 PM, Matthew Cini Sarreo wrote: > Hello all; > > I have recently encountered a problem when using raw sockets on FreeBSD 8 > (8.0-RELEASE) when using ESP raw sockets. > > I have created a raw esp socket using: > socket(AF_INET, SOCK_RAW, 50); > which works fine. However, when there is a packet on the socket, recvfrom() > returns a packet where the length bytes in the IP header are incorrect; they > are swapped (MSB is placed in the LSB and vice-versa) > > tcpdump shows the following: > > tcpdump: listening on le0, link-type EN10MB (Ethernet), capture size 96 > bytes > 15:00:53.993810 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto ESP > (50), length 120) >10.0.251.228 > 10.0.252.231: ESP(spi=0xa0534f17,seq=0x3), length 100 >0x: 4500 0078 4000 4032 2d88 0a00 fbe4 >0x0010: 0a00 fce7 a053 4f17 0003 6885 8abd >0x0020: 5ded 44dc 842f 3081 8fa3 bde4 2265 >0x0030: 7438 2bf4 049c 664b 7dc4 44ef 1f6f 5e7d >0x0040: b8c1 482f 8c3b f488 a19a 3d9a d5fe ed9d >0x0050: b1c2 > > > However, recvfrom() returns the following buffer: > 4500 6400 0040 4032 2D88 0A00 FBE4 > 0A00 FCE7 A053 4F17 0003 6885 8ABD > 5DED 44DC 842F 3081 8FA3 BDE4 2265 > 7438 2BF4 049C 664B 7DC4 44EF 1F6F 5E7D > B8C1 482F 8C3B F488 A19A 3D9A D5FE ED9D > B1C2 > > As it is easy to see, the length is not correct (bytes 2 and 3 are 0x6400 > instead of 0x0064) and it does not correspond to the value returned by > recvfrom(). > > Is this a known issue? Am I missing some options for raw sockets that are > required for FreeBSD? I have attempted this on a socket to a TUN interface > (not with an ESP socket) and the buffer had the proper length; it seems to > only happen with ESP. This code runs fine on multiple Linux distributions > and on Windows; it was only noticed with FreeBSD. Could it be that there is > some other ESP application running and interfering (I have not installed > any; don't know if there are by default and I'm quite new to any of the > BSDs)? I think Linux provides the tot_len field in network byte order whereas FreeBSD provides it in host byte order. At least they expect it that way when using a send call. So you must take care of this in the source code of the application by using an #ifdef... Best regards Michael > > Any help would be much appreciated. > Matt > ___ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" > ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
RE: bce packet loss
> I'm running 8.1 and at least on the bce hosts, it looks like flow > control > isn't supported, it was added on 4/30/2010: > > http://svnweb.freebsd.org/base/head/sys/dev/bce/if_bce.c?r1=206268&r2=20 > 7411 > > In my 8.1 sources I still see this comment, which was removed in the > above > commit: > /* ToDo: Enable flow control support in brgphy and bge. */ This really applies to whether the user can set flow control manually. By default the NIC should auto-negotiate link speed and flow-control which is the most common case. For example, you can't set RX flow control and disable TX flow control with ifconfig using the current implementation, though it is possible in Linux with ethtool. > So at least on the bce hosts (and bge it seems), I do not have flow > control available on the NIC. Flow control will be set according to auto-negotiation results. For most cases that means flow control will be enabled since both sides normally support it. > The sysctl stats do show that it's > received > "XON/XOFF" frames, which I assume are flow control messages, but there's > no indication that the NIC does anything with them. There won't be any indication in the driver since flow control is managed in hardware. You'd need a wire capture to see that bce(4) has stopped sending frames in response to receiving an XOFF flow control frame or started sending frames in response to receiving an XON flow control frame. > >> We are running 8.1, am I correct in that flow control is not > implemented > >> there? We do have an 8.2-STABLE image from a month or so ago that we > >> are > >> testing with zfs v28, might that implement flow control? > > > > Flow control will depend on the NIC driver implementation. Older > > versions of the bce(4) firmware will rarely generate pause frames > > (frames would be dropped by firmware but statistics should show > > the frame drop occurring) and should always honor pause frames > > from the link partner when flow control is enabled. > > I think my nics probably lack it. I am also guessing that if any > high-traffic host ignores flow control frames, that's going to screw up > other hosts as well since the one causing the buffers to fill is not > going > to throttle and the overflow will continue, correct? Flow control is asymmetric and operates independently in both directions. If the traffic source ignores flow control frames or did not auto-negotiate flow control then it can certainly overwhelm the switch or traffic sink's buffers, causing frame drop and retransmits. > > >> > >> Although reading this: > >> > >> http://en.wikipedia.org/wiki/Ethernet_flow_control > >> > >> It sounds like flow control is not terribly optimal since it forces > the > >> host to block all traffic. Not sure if this means drops are > eliminated, > >> reduced or shuffled around. Frame drops should be eliminated, though congestion could spread upstream to other devices which don't have flow control and result in frame drops and retransmits there. > > When congestion is detected the switch should buffer up to a certain > > limit (say 80% of full) and then start sending pause frames to avoid > > dropping frames. This will affect all hosts connecting through the > > switch so congestion at one host can spread to other hosts (see > > > http://www.ieee802.org/3/cm_study/public/september04/thaler_3_0904.pdf). > > Wow. I did not catch that. I do recall something about the flow > control > frames being multicast - so every host gets them and pauses. That's... > interesting, isn't it? Pause frames are multicast frames but they are only transmitted between link partners (NIC to switch) and never sent further in the network. Flow control is intended to be a local behavior but the link indicates it can have an unintended global effect. Dave ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
RE: bce packet loss
On Mon, 11 Jul 2011, David Christensen wrote: I'm running 8.1 and at least on the bce hosts, it looks like flow control isn't supported, it was added on 4/30/2010: http://svnweb.freebsd.org/base/head/sys/dev/bce/if_bce.c?r1=206268&r2=20 7411 In my 8.1 sources I still see this comment, which was removed in the above commit: /* ToDo: Enable flow control support in brgphy and bge. */ This really applies to whether the user can set flow control manually. By default the NIC should auto-negotiate link speed and flow-control which is the most common case. For example, you can't set RX flow control and disable TX flow control with ifconfig using the current implementation, though it is possible in Linux with ethtool. OK, well that explains alot. I've had it hammered into my brain over the years that for servers it's always best to set link speed and duplex manually at both ends to remove any possible issues with link negotiation. This advice was from back when FE was still new, and I recall autonegotiation causing issues, I believe specifically with some vintage Cisco switches. So at least on the bce hosts (and bge it seems), I do not have flow control available on the NIC. Flow control will be set according to auto-negotiation results. For most cases that means flow control will be enabled since both sides normally support it. It sounds like I'm causing myself trouble here by not letting everything autonegotiate. I'll move things to auto and see what happens.B The sysctl stats do show that it's received "XON/XOFF" frames, which I assume are flow control messages, but there's no indication that the NIC does anything with them. There won't be any indication in the driver since flow control is managed in hardware. You'd need a wire capture to see that bce(4) has stopped sending frames in response to receiving an XOFF flow control frame or started sending frames in response to receiving an XON flow control frame. Ah. I was hoping for something in the ifconfig output. I'll see if tcpdump and wireshark can tell me anything about this host. One the one host (w/bce) I just set to full auto, the switch claims to have negotiated 1000FD w/flow control (this specifically shows as "auto+enabled" on the switch side). I see that the "sysctl dev.bce.1" tree has some info, and I can see that the NIC is receiving flow control frames: dev.bce.1.stat_XonPauseFramesReceived: 16638 dev.bce.1.stat_XoffPauseFramesReceived: 17239 These lines are a bit puzzling though: dev.bce.1.stat_FlowControlDone: 0 dev.bce.1.stat_XoffStateEntered: 0 We are running 8.1, am I correct in that flow control is not implemented there? We do have an 8.2-STABLE image from a month or so ago that we are testing with zfs v28, might that implement flow control? Flow control will depend on the NIC driver implementation. Older versions of the bce(4) firmware will rarely generate pause frames (frames would be dropped by firmware but statistics should show the frame drop occurring) and should always honor pause frames from the link partner when flow control is enabled. I think my nics probably lack it. I am also guessing that if any high-traffic host ignores flow control frames, that's going to screw up other hosts as well since the one causing the buffers to fill is not going to throttle and the overflow will continue, correct? Flow control is asymmetric and operates independently in both directions. If the traffic source ignores flow control frames or did not auto-negotiate flow control then it can certainly overwhelm the switch or traffic sink's buffers, causing frame drop and retransmits. I ran a quick scp of a large file to another host with 100Mb connectivity and those xon/xoff counters incremented, but they were doing that previously. I assume that confirms the switch is at least asking for a pause. I still saw about 5000 dropped ingress packets on the switch, but I assume that could be due to some other host filling the buffers. Although reading this: http://en.wikipedia.org/wiki/Ethernet_flow_control It sounds like flow control is not terribly optimal since it forces the host to block all traffic. Not sure if this means drops are eliminated, reduced or shuffled around. Frame drops should be eliminated, though congestion could spread upstream to other devices which don't have flow control and result in frame drops and retransmits there. When congestion is detected the switch should buffer up to a certain limit (say 80% of full) and then start sending pause frames to avoid dropping frames. This will affect all hosts connecting through the switch so congestion at one host can spread to other hosts (see http://www.ieee802.org/3/cm_study/public/september04/thaler_3_0904.pdf). Wow. I did not catch that. I do recall something about the flow control frames being multicast - so every host gets them and pauses. That's... interesting, isn't it? Pause frames are multicast f
Re: bce packet loss
On 07/11/2011 21:09, Charles Sprickman wrote: > I've had it hammered into my brain over the years that for servers it's > always best to set link speed and duplex manually at both ends to remove > any possible issues with link negotiation. That hasn't been the right thing to do for at least 8 years or so, probably 10 or more. Yes, back in the 90's when all of this stuff was still new it was not uncommon to have autonegotiation issues, but any even sort of modern hardware (on either side of the link) will do better with auto than not. hth, Doug -- Nothin' ever doesn't change, but nothin' changes much. -- OK Go Breadth of IT experience, and depth of knowledge in the DNS. Yours for the right price. :) http://SupersetSolutions.com/ ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: bce packet loss
On Mon, 11 Jul 2011, Doug Barton wrote: On 07/11/2011 21:09, Charles Sprickman wrote: I've had it hammered into my brain over the years that for servers it's always best to set link speed and duplex manually at both ends to remove any possible issues with link negotiation. That hasn't been the right thing to do for at least 8 years or so, probably 10 or more. Yes, back in the 90's when all of this stuff was still new it was not uncommon to have autonegotiation issues, but any even sort of modern hardware (on either side of the link) will do better with auto than not. Some of us still work at places where the hardware is 10 years old, you know. :) I do still see fixed setups in service provider handoffs - for example this colo, Level3 and Hurricane. Also all our metro ethernet stuff specifies a fixed configuration. From what I can gather, this seems to be the standard practice in that space, but then again you're supposed to be plugging into equipment that wouldn't have the buffer issues that a $450 Dell switch would have. The rule I recall is never do autoneg on one side and fixed on the other, that more often than not will end up in a duplex mismatch. Charles hth, Doug -- Nothin' ever doesn't change, but nothin' changes much. -- OK Go Breadth of IT experience, and depth of knowledge in the DNS. Yours for the right price. :) http://SupersetSolutions.com/ ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: bce packet loss
On 07/11/2011 22:47, Charles Sprickman wrote: > On Mon, 11 Jul 2011, Doug Barton wrote: > >> On 07/11/2011 21:09, Charles Sprickman wrote: >>> I've had it hammered into my brain over the years that for servers it's >>> always best to set link speed and duplex manually at both ends to remove >>> any possible issues with link negotiation. >> >> That hasn't been the right thing to do for at least 8 years or so, >> probably 10 or more. >> >> Yes, back in the 90's when all of this stuff was still new it was not >> uncommon to have autonegotiation issues, but any even sort of modern >> hardware (on either side of the link) will do better with auto than not. > > Some of us still work at places where the hardware is 10 years old, you > know. :) True ... hence my careful specification of "sort of modern." :) > I do still see fixed setups in service provider handoffs - for example > this colo, Level3 and Hurricane. Also all our metro ethernet stuff > specifies a fixed configuration. > > From what I can gather, this seems to be the standard practice in that > space, but then again you're supposed to be plugging into equipment that > wouldn't have the buffer issues that a $450 Dell switch would have. Well one could also say that this sort of thing tends to result from the, "There is a knob, I MUST twist it!" syndrome. > The rule I recall is never do autoneg on one side and fixed on the > other, that more often than not will end up in a duplex mismatch. Yes, that's definitely true, and I should have mentioned it. Whatever you do on one side (auto/manual) you must also do on the other. Doug -- Nothin' ever doesn't change, but nothin' changes much. -- OK Go Breadth of IT experience, and depth of knowledge in the DNS. Yours for the right price. :) http://SupersetSolutions.com/ ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"