Re: Intel 82574L interface wedging on em 7.1.9/7.2.3 when MSIX enabled

2011-11-09 Thread Hooman Fazaeli

On 11/8/2011 11:00 PM, Adrian Chadd wrote:

On 8 November 2011 09:21, Hooman Fazaeli  wrote:


With MSIX enabled, the link task (em_handle_link) does _not_ triggers
_start when the link changes state from inactive to active (which it
should).
If if_snd quickly fills up during a temporary link loss, transmission is
stopped forever and the driver never recovers from that state.

The last patch should have reduced the frequency of the problem
but it assumes every IFQ_ENQUEUE is followed by a if_start which
is not a true assumption.


FWIW, I saw something very similar with the if_arge code port from
Linux. If the TX queue filled up and wasn't serviced before it hit
completely full, it was never drained.

It may be worthwhile auditing some of the other NIC drivers to ensure
this kind of situation isn't occuring. Especially if they came from
Linux. :-)

That's a great catch, I hope it finally fixes the if_em issues with MSIX. :-)


Adrian

Just for the record, I should inform you that igb, ixgb and ixbge have the
same issue. I have not checked other drivers.

And there is another subtle problem with all these drivers: if transmit 
(xxx_xmit)
fails for a temporary memory shortage (i.e., DMA failure for ENOMEM), the driver
may enter the OACTIVE state and _never_ recovers! The scenario is somehow as
before:

- if_start is executed.
- xxx_xmit fails with ENOMEM.
- xxx_start_locked sets OACTIVE. Note that this is different from a low TX 
descriptor
  condition which also sets OACTIVE.
- stack enqueues packets in if_snd but does not call if_start since driver is 
OACTIVE.
- stack enqueues more packets until if_snd fills up and packets start to drop.
- Since there is nowhere in the driver's code to re-try transmission when 
memory becomes
  available again (xxx_local_timer is a candidate), the driver remains OACTIVE 
forever
  until it is re-initialized.

I am working on patches for em/igb/ixgb/ixgbe to fix these issues and would be
happy to share them with anyone who is interested.

since these are really severe problems, I hope gurus apply official fixes ASAP.








___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Intel 82574L interface wedging on em 7.1.9/7.2.3 when MSIX enabled

2011-11-09 Thread Hooman Fazaeli

On 11/8/2011 10:23 PM, Jason Wolfe wrote:

On Tue, Nov 8, 2011 at 10:21 AM, Hooman Fazaeli mailto:hoomanfaza...@gmail.com>> wrote:

I have allocated more time to the problem and guess I can explain what
your problem is.

With MSIX disabled, the driver uses fast interrupt handler (em_irq_fast)
which calls rx/tx task and then checks for link status change. This
implies that rx/tx task is executed with every link state change. This is
not efficient, as it is a waste of time to start transmission when link is 
down.
However, it has the effect that after a temporary link loss 
(active->inactive->active),
_start is executed and transmission continues normally. The value of 
link_toggles (3)
clearly indicates that you had such a transition when the problem occured.

With MSIX enabled, the link task (em_handle_link) does _not_ triggers
_start when the link changes state from inactive to active (which it 
should).
If if_snd quickly fills up during a temporary link loss, transmission is
stopped forever and the driver never recovers from that state.

The last patch should have reduced the frequency of the problem
but it assumes every IFQ_ENQUEUE is followed by a if_start which
is not a true assumption.

If you are willing to test, I can prepare another patch for you to fix
the issue in a different and more reliable way.


Hooman,

Thanks again for the assist, it sounds like this may also be why we see a bit 
higher latency with MSI-X disabled on this chipset.

I'm happy to test any patches as I have a handful of boxes set aside to 
'research' this issue.  Hopefully the testing here helps along any patches to 
the tree for others benefit also.

Jason

Latency may or may not be related. I am doing more tests and will post
my findings soon.



___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Possible MROUTING regression in 9.0 RC1

2011-11-09 Thread Pavel Timofeev
Crashes is often, even on different hardware (another server)
[root@timp ~]# /usr/local/sbin/igmpproxy -dvv /usr/local/etc/igmpproxy.conf
Searching for config file at '/usr/local/etc/igmpproxy.conf'
Config: Quick leave mode enabled.
Config: Got a phyint token.
Config: IF: Config for interface re0.
Config: IF: Got upstream token.
Config: IF: Got ratelimit token '0'.
Config: IF: Got threshold token '1'.
Config: IF: Got altnet token 172.31.242.0/24.
Config: IF: Altnet: Parsed altnet to 172.31.242/24.
IF name : re0
Next ptr : 0
Ratelimit : 0
Threshold : 1
State : 1
Allowednet ptr : c71040
Config: Got a phyint token.
Config: IF: Config for interface bridge0.
Config: IF: Got downstream token.
Config: IF: Got ratelimit token '0'.
Config: IF: Got threshold token '1'.
IF name : bridge0
Next ptr : 0
Ratelimit : 0
Threshold : 1
State : 2
Allowednet ptr : 0
Config: Got a phyint token.
Config: IF: Config for interface lo0.
Config: IF: Got disabled token.
IF name : lo0
Next ptr : 0
Ratelimit : 0
Threshold : 1
State : 0
Allowednet ptr : 0
Config: Got a phyint token.
Config: IF: Config for interface plip0.
Config: IF: Got disabled token.
IF name : plip0
Next ptr : 0
Ratelimit : 0
Threshold : 1
State : 0
Allowednet ptr : 0
buildIfVc: Interface re0 Addr: 10.85.13.39, Flags: 0x8843,
Network: 10.85.13/24
buildIfVc: Interface lo0 Addr: 127.0.0.1, Flags: 0x8049, Network: 127/8
buildIfVc: Interface bridge0 Addr: 172.16.254.1, Flags: 0x8843,
Network: 172.16.254/24
Found config for re0
Found config for bridge0
adding VIF, Ix 0 Fl 0x0 IP 0x270d550a re0, Threshold: 1, Ratelimit: 0
Network for [re0] : 10.85.13/24
Network for [re0] : 172.31.242/24
adding VIF, Ix 1 Fl 0x0 IP 0x01fe10ac bridge0, Threshold: 1, Ratelimit: 0
Network for [bridge0] : 172.16.254/24
Got 262144 byte buffer size in 0 iterations
Joining all-routers group 224.0.0.2 on vif 172.16.254.1
joinMcGroup: 224.0.0.2 on bridge0
SENT Membership query   from 172.16.254.1to 224.0.0.1
Sent membership query from 172.16.254.1 to 224.0.0.1. Delay: 10
Created timeout 1 (#0) - delay 10 secs
(Id:1, Time:10)
Created timeout 2 (#1) - delay 21 secs
(Id:1, Time:10)
(Id:2, Time:21)
received packet from 172.16.254.1 shorter (28 bytes) than hdr+data
length (20+28)
received packet from 172.16.254.1 shorter (32 bytes) than hdr+data
length (24+32)
About to call timeout 1 (#0)
Aging routes in table.

Current routing table (Age active routes):
-
No routes in table...
-
received packet from 10.85.13.5 shorter (28 bytes) than hdr+data length (20+28)
^Cselect() failure; Errno(4): Interrupted system call
Got a interupt signal. Exiting.
clean handler called
All routes removed. Routing table is empty.
Shutdown complete


2011/11/8 Pavel Timofeev :
> And sometimes igmpproxy's shutdown lead to crash of my system.
> Without any panics, it just reboots. oO
>
> 2011/11/7 Pavel Timofeev :
>> Hello! I have problems with ip_mroute (loaded as module) - kernel
>> multicast packet forwarder.
>> I have 2 disk: freebsd 8.2 release amd64 on first and freebsd 9.0 rc1 on 
>> second.
>> I use net/igmpproxy to watch IPTV on my home atom-based router.
>>
>> On FreeBSD 8.2 it works good.
>>
>> But when I try to use FreeBSD 9.0 RC-1 in same role (with same
>> configs, of cource) I have messages like:
>> Nov  7 16:16:46 timp igmpproxy[35495]: received packet from
>> 172.16.254.1 shorter (28 bytes) than hdr+data length (20+28)
>> Nov  7 16:16:47 timp igmpproxy[35495]: received packet from
>> 172.16.254.1 shorter (32 bytes) than hdr+data length (24+32)
>> Nov  7 16:17:28 timp igmpproxy[35495]: received packet from 10.85.13.5
>> shorter (28 bytes) than hdr+data length (20+28)
>> And IPTV doesn't work =(
>>
>> Any ideas?
>> Do you need configs?
>>
>
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Intel 82574L interface wedging on em 7.1.9/7.2.3 when MSIX enabled

2011-11-09 Thread Adrian Chadd
There's no locking around the OACTIVE flag set/clear, right?
Is it possible that multiple TX threads are fiddling with OACTIVE and
then it's not being properly cleared and tx kicked?


Adrian
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


ipf(8) for TCP rate limiting

2011-11-09 Thread Vijay Singh
Hi. My machine has some ipf(8) rules and I see that when there is a
TCP connection storm to the http port the filer sends out TCP resets.
I wanted to know if its possible to configure the pps limit for TCP
connections before the RSTs kick in using ipf.

regards,
vijay
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Intel 82574L interface wedging on em 7.1.9/7.2.3 when MSIX enabled

2011-11-09 Thread Jack Vogel
Hmmm, that's an interesting point Adrian, I'll look at that more closely.

Jack


On Wed, Nov 9, 2011 at 4:09 PM, Adrian Chadd  wrote:

> There's no locking around the OACTIVE flag set/clear, right?
> Is it possible that multiple TX threads are fiddling with OACTIVE and
> then it's not being properly cleared and tx kicked?
>
>
> Adrian
>
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Intel 82574L interface wedging on em 7.1.9/7.2.3 when MSIX enabled

2011-11-09 Thread Jack Vogel
BTW, the new delta on the driver is coming, I just ran into some issues
with the validation testing done in house and I've had to iron a few things
out.

I am going to implement Hooman's idea of a TX clean from local_timer,
that seems like a good idea.

The other thing I'm doing right now is reenabling the MULTIQUEUE define
and looking at 82574 performance, once I did that I found certain pieces
that needed tweaking. The jury is still out on whether or not this is worth
doing, but I'm making it possible for people to try for themselves.

Anyone that really wants to try this driver early might want to send me
some directed email.

Jack


On Wed, Nov 9, 2011 at 9:00 PM, Jack Vogel  wrote:

> Hmmm, that's an interesting point Adrian, I'll look at that more closely.
>
> Jack
>
>
>
> On Wed, Nov 9, 2011 at 4:09 PM, Adrian Chadd  wrote:
>
>> There's no locking around the OACTIVE flag set/clear, right?
>> Is it possible that multiple TX threads are fiddling with OACTIVE and
>> then it's not being properly cleared and tx kicked?
>>
>>
>> Adrian
>>
>
>
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: FreeBSD 9 and ARP multicast source address error messages

2011-11-09 Thread Gleb Smirnoff
  Alexander,

On Tue, Nov 08, 2011 at 05:14:45PM -0500, Alexander Wittig wrote:
A> I upgraded one of my machines from FreeBSD 8 to 9.0-RC1 (FreeBSD 
bt.pa.msu.edu 9.0-RC1 FreeBSD 9.0-RC1 #3: Fri Oct 28 16:45:28 EDT 2011 
r...@bt.pa.msu.edu:/usr/obj/usr/src/sys/ALEX  i386), and ever since that 
upgrade the kernel keeps flooding my log files with messages like these:
A> Nov  7 16:40:01 bt kernel: in_arp: source hardware address is 
multicast.in_arp: source hardware address is multicast.
A> Nov  7 16:42:02 bt kernel: in_arp: source hardware address is 
multicast.in_arp: source hardware address is multicast.
A> 
A> A Google search for these didn't reveal any useful results as to why this 
happens or how to fix it. So I did a tcpdump and matched the time stamps with 
packets, and I found the ones causing problems (the only ones with a multicast 
bit set) to be like this:
A> 16:40:01.099823 02:02:23:09:44:3c > 03:bf:23:09:44:87, ethertype ARP 
(0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Reply 35.9.68.228 is-at 
03:bf:23:09:44:e4, length 46
A> 0x:  03bf 2309 4487 0202 2309 443c 0806 0001
A> 0x0010:  0800 0604 0002 03bf 2309 44e4 2309 44e4
A> 0x0020:  02bf 2309 443c 2309 4487   
A> 0x0030:       
A> 
A> It appears the broadcast MAC 03:bf:23:09:44:87 is part of Microsoft's 
network load balancing mechanism, with the 03:bf indicating that much and the 
23:09:44:87 containing the IP address of the load balance cluster 
(35.9.68.228). These types of MACs seem to be commonly used in their load 
balancing implementation.
A> 
A> To prevent these messages from producing thousands of lines of logs each 
day, I added the following two IPFW rules and enabled ethernet package 
filtering (sysctl net.link.ether.ipfw=1):
A> deny ip from any to any MAC 03:bf:00:00:00:00/16 any layer2
A> allow ip from any to any layer2
A> 
A> This effectively blocks those packages and the resulting error messages. But 
I'm wondering if the newly added(?) ARP code in FBSD 9 is a bit too fussy about 
these, or if MS is abusing the ARP protocol here. Either way, this was never a 
problem with FBSD 7 or 8.

Can you try attached patch. It reduces severity level of all ARP
messages, that can be triggered by packet on network, with expection to
"using my IP address".

With default syslog.conf, now ARP errors won't get to console.

Also, the multicast message lacked "\n" newline character, that's why,
I suppose, syslogd failed to coalesce a number of messages into a single
entry "last message repeated X times".

-- 
Totus tuus, Glebius.
Index: if_ether.c
===
--- if_ether.c	(revision 227416)
+++ if_ether.c	(working copy)
@@ -433,7 +433,7 @@
 
 	if (m->m_len < sizeof(struct arphdr) &&
 	((m = m_pullup(m, sizeof(struct arphdr))) == NULL)) {
-		log(LOG_ERR, "arp: runt packet -- m_pullup failed\n");
+		log(LOG_NOTICE, "arp: runt packet -- m_pullup failed\n");
 		return;
 	}
 	ar = mtod(m, struct arphdr *);
@@ -443,7 +443,7 @@
 	ntohs(ar->ar_hrd) != ARPHRD_ARCNET &&
 	ntohs(ar->ar_hrd) != ARPHRD_IEEE1394 &&
 	ntohs(ar->ar_hrd) != ARPHRD_INFINIBAND) {
-		log(LOG_ERR, "arp: unknown hardware address format (0x%2D)\n",
+		log(LOG_NOTICE, "arp: unknown hardware address format (0x%2D)\n",
 		(unsigned char *)&ar->ar_hrd, "");
 		m_freem(m);
 		return;
@@ -451,7 +451,7 @@
 
 	if (m->m_len < arphdr_len(ar)) {
 		if ((m = m_pullup(m, arphdr_len(ar))) == NULL) {
-			log(LOG_ERR, "arp: runt packet\n");
+			log(LOG_NOTICE, "arp: runt packet\n");
 			m_freem(m);
 			return;
 		}
@@ -527,7 +527,7 @@
 
 	req_len = arphdr_len2(ifp->if_addrlen, sizeof(struct in_addr));
 	if (m->m_len < req_len && (m = m_pullup(m, req_len)) == NULL) {
-		log(LOG_ERR, "in_arp: runt packet -- m_pullup failed\n");
+		log(LOG_NOTICE, "in_arp: runt packet -- m_pullup failed\n");
 		return;
 	}
 
@@ -537,13 +537,14 @@
 	 * a protocol length not equal to an IPv4 address.
 	 */
 	if (ah->ar_pln != sizeof(struct in_addr)) {
-		log(LOG_ERR, "in_arp: requested protocol length != %zu\n",
+		log(LOG_NOTICE, "in_arp: requested protocol length != %zu\n",
 		sizeof(struct in_addr));
 		return;
 	}
 
 	if (ETHER_IS_MULTICAST(ar_sha(ah))) {
-		log(LOG_ERR, "in_arp: source hardware address is multicast.");
+		log(LOG_NOTICE, "in_arp: %*D is multicast\n",
+		ifp->if_addrlen, (u_char *)ar_sha(ah), ":");
 		return;
 	}
 
@@ -645,7 +646,7 @@
 	if (!bcmp(ar_sha(ah), enaddr, ifp->if_addrlen))
 		goto drop;	/* it's from me, ignore it. */
 	if (!bcmp(ar_sha(ah), ifp->if_broadcastaddr, ifp->if_addrlen)) {
-		log(LOG_ERR,
+		log(LOG_NOTICE,
 		"arp: link address is broadcast for IP address %s!\n",
 		inet_ntoa(isaddr));
 		goto drop;
@@ -681,7 +682,7 @@
 		/* the following is not an error when doing bridging */
 		if (!bridged && la->lle_tbl->llt_ifp != ifp && !carp_match) {
 			if (log_arp_wrong_iface)
-log(LOG_ERR, "arp: %s is on %s "
+