Boris B. Zhmurov wrote:
Hello, Jesse Brandeburg.
On 06.04.2006 04:42 you said the following:
I built and tested the driver with patches on 2.6.16, with pci-x
adapters. I removed some workarounds for PCIe adapters, but I dont
think anyone having this problem has a PCIe adapter anyway. I saw no
TX hangs and ran some bi-directional tests, so i think the driver
should work okay. Just warning you I did minimal testing.
*********************
e1000: transmit the old fashioned way
It seems back in the day of 2.6.11, there were no sk_forward_alloc
asserions. Forward port that transmit code to see if it fixes the
issues
in today's kernel. Unfortunately it doesn't have all the bug fixes that
the current code has, but if we get transmit timeouts we can add in
workarounds appropriately.
this changes only the e1000_tso function
With this one still having:
TCP: Treason uncloaked! Peer 80.72.16.78:11460/80 shrinks window
2223569515:2223569516. Repaired.
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/core/stream.c
(283)
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/ipv4/af_inet.c
(150)
This is a very important result. It shows that the changes to the
driver to call pskb_expand_head for TSO operations are not the cause of
this problem.
We also have some new data from the last couple of days. First, I think
that this problem is likely not just E1000's fault. We have multiple
reports both in bugzilla.kernel.org and from a distro that show this
problem has occurred on (at least) tg3 driven adapters as well as e1000.
I've been able to reliably reproduce this issue in house (finally)
thanks to one of our testers. The test is using the tbench application
from the dbench package at samba.org.
on the server, start tbench_srv
on the machine you're trying to repro the issue on, start tbench 500
<server ip>, on another client start tbench 50 <server ip>
I've seen sk_forward_alloc assertions on both server and client both
running 2.6.16. We're trying to figure out where there might be a stale
pointer to an sk that accesses the data after free. something seems to
write ff ff ff ff 00 00 00 00 to memory after it is freed maybe?
It does seem that the load (the 500 threads) is important to this
failure. I've also seen a report that a memory poisoning kernel caught
the failure.
Any investigation hints for me?
e1000: implement old xmit_frame
It seems back in the day of 2.6.11, there were no sk_forward_alloc
asserions. Forward port that transmit code to see if it fixes the
issues
in today's kernel. Unfortunately it doesn't have all the bug fixes that
the current code has, but if we get transmit timeouts we can add in
workarounds appropriately.
this changes the e1000_xmit_frame function, and some ancilliaries
Signed-off-by: Jesse Brandeburg <[EMAIL PROTECTED]>
Can't apply this one:
[EMAIL PROTECTED] linux-2.6.16]$ patch -p1 <
../../../SOURCES/linux-2.6.16-e1000-implement_old_xmit_frame.patch
patching file drivers/net/e1000/e1000_main.c
Hunk #1 succeeded at 2620 (offset -105 lines).
Hunk #2 FAILED at 2695.
Hunk #4 FAILED at 2837.
Hunk #5 FAILED at 2868.
Hunk #6 FAILED at 2899.
4 out of 6 hunks FAILED -- saving rejects to file
drivers/net/e1000/e1000_main.c.rej
well that seems kind of lame, but I think we got the data that we needed
from the first patch.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html