We’re experiencing a strange issue in production failure with epair (which we’re using to talk vimage to jails).
FreeBSD s5 11.1-STABLE FreeBSD 11.1-STABLE #2 r328930: Tue Feb 6 16:05:59 GMT 2018 root@s5:/usr/obj/usr/src/sys/TRUESPEED amd64 Looks like epair has suddenly stopped forwarding packets between the pair interfaces. Our server has been up for 82 days and it’s been working fine, but suddenly packets have stopped being forwarded between epairs across the entire system. (We’ve got around 30 epairs on the host). So, we’ve got a sudden ARP resolution failure which is affecting all services. :(. Here’s the test. On a working machine this works fine: # Create an email and put an IP address on it, so we can generate ARP traffic with PING. root@magnesium:/usr/home/systems # ifconfig epair create epair7a root@magnesium:/usr/home/systems # ifconfig epair7a up root@magnesium:/usr/home/systems # ifconfig epair7b up root@magnesium:/usr/home/systems # ifconfig epair7a inet 10.140.0.1/30 # Generate ARP traffic over the epair… should see arp requests on epair7b. root@magnesium:/usr/home/systems # ping 10.140.0.2 PING 10.140.0.2 (10.140.0.2): 56 data bytes # Watch traffic coming in from the epair root@magnesium:/usr/home/systems # tcpdump -i epair7b 10:22:27.446651 ARP, Request who-has 10.140.0.2 tell 10.140.0.1, length 28 10:22:28.475086 ARP, Request who-has 10.140.0.2 tell 10.140.0.1, length 28 ^C 2 packets captured 2 packets received by filter 0 packets dropped by kernel Works fine. However, on the failing machine we don’t get any packets forwarded (any more — remember it’s been working fine for a few months - suddenly stopped working :( ). root@s5:/usr/home/systems # ifconfig pair create epair19a root@s5:/usr/home/systems # ifconfig epair19a up root@s5:/usr/home/systems # ifconfig epair7b up root@s5:/usr/home/systems # ifconfig epair7a inet 10.140.0.1/30 root@s5:/usr/home/systems # ping 10.140.0.2 PING 10.140.0.2 (10.140.0.2): 56 data bytes root@s5:/usr/home/systems # tcpdump -ni epair19a 09:24:20.396384 ARP, Request who-has 10.130.0.2 tell 10.130.0.1, length 28 09:24:21.404737 ARP, Request who-has 10.130.0.2 tell 10.130.0.1, length 28 ^C root@s5:/usr/home/systems # tcpdump -ni epair19b [Tumble weed - no traffic seen] ^C Has anyone seen this before? We’re going to reboot and see if that fixes the problem. The failing kernel in question is: FreeBSD s5 11.1-STABLE FreeBSD 11.1-STABLE #2 r328930: Tue Feb 6 16:05:59 GMT 2018 root@s5:/usr/obj/usr/src/sys/TRUESPEED amd64 Break break. We’ve just seen a bug bugzilla report 22710, reporting that epair fails when the queue limit is hit (net.link.epair.netisr_maxqlen). We’ve just introduced a high bandwidth service on this machine and so it’s probably that that’s what’s caused the issue. We’ve currently got a value of: net.link.epair.netisr_maxqlen: 2100 root@s5:/usr/home/systems # netstat -Q Configuration: Setting Current Limit Thread count 1 1 Default queue limit 256 10240 Dispatch policy direct n/a Threads bound to CPUs disabled n/a Protocols: Name Proto QLimit Policy Dispatch Flags ip 1 256 flow default --- igmp 2 256 source default --- rtsock 3 256 source default --- arp 4 256 source default --- ether 5 256 source direct --- ip6 6 256 flow default --- epair 8 2100 cpu default CD- Workstreams: WSID CPU Name Len WMark Disp'd HDisp'd QDrops Queued Handled 0 0 ip 0 253 385468689 0 0 49360754 434829441 0 0 igmp 0 0 0 0 0 0 0 0 0 rtsock 0 5 0 0 0 1144 1144 0 0 arp 0 0 5573045 0 0 0 5573045 0 0 ether 0 0 1125223166 0 0 0 1125223166 0 0 ip6 0 4 90 0 0 1220274 1220364 0 0 epair 0 2100 0 0 214 4994675481 4994675481 But we can’t see how much of the queue is currently being used, or what size we need to set it to. But, why has hitting the queue limit broken it entirely! Help! Cheers, Joe — Dr Josef Karthauser Chief Technical Officer (01225) 300371 / (07703) 596893 www.truespeed.com <http://www.truespeed.com/> / theTRUESPEED <http://www.facebook.com/theTRUESPEED> @theTRUESPEED <https://twitter.com/thetruespeed> This email contains TrueSpeed information, which may be privileged or confidential. It's meant only for the individual(s) or entity named above. If you're not the intended recipient, note that disclosing, copying, distributing or using this information is prohibited. If you've received this email in error, please let me know immediately on the email address above. Thank you. We monitor our email system, and may record your emails. _______________________________________________ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"