Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...

Jesse Brandeburg Fri, 14 Apr 2006 13:31:20 -0700

Boris B. Zhmurov wrote:

Hello, Jesse Brandeburg.
On 06.04.2006 04:42 you said the following:
I built and tested the driver with patches on 2.6.16, with pci-xadapters. I removed some workarounds for PCIe adapters, but I dontthink anyone having this problem has a PCIe adapter anyway. I saw noTX hangs and ran some bi-directional tests, so i think the drivershould work okay. Just warning you I did minimal testing.
*********************
e1000: transmit the old fashioned way

It seems back in the day of 2.6.11, there were no sk_forward_alloc
asserions. Forward port that transmit code to see if it fixes theissues
in today's kernel.  Unfortunately it doesn't have all the bug fixes that
the current code has, but if we get transmit timeouts we can add in
workarounds appropriately.

this changes only the e1000_tso function
With this one still having:
TCP: Treason uncloaked! Peer 80.72.16.78:11460/80 shrinks window2223569515:2223569516. Repaired.KERNEL: assertion (!sk->sk_forward_alloc) failed at net/core/stream.c(283)KERNEL: assertion (!sk->sk_forward_alloc) failed at net/ipv4/af_inet.c(150)

This is a very important result. It shows that the changes to thedriver to call pskb_expand_head for TSO operations are not the cause ofthis problem.

We also have some new data from the last couple of days. First, I thinkthat this problem is likely not just E1000's fault. We have multiplereports both in bugzilla.kernel.org and from a distro that show thisproblem has occurred on (at least) tg3 driven adapters as well as e1000.

I've been able to reliably reproduce this issue in house (finally)thanks to one of our testers. The test is using the tbench applicationfrom the dbench package at samba.org.


on the server, start tbench_srv

on the machine you're trying to repro the issue on, start tbench 500<server ip>, on another client start tbench 50 <server ip>I've seen sk_forward_alloc assertions on both server and client bothrunning 2.6.16. We're trying to figure out where there might be a stalepointer to an sk that accesses the data after free. something seems towrite ff ff ff ff 00 00 00 00 to memory after it is freed maybe?

It does seem that the load (the 500 threads) is important to thisfailure. I've also seen a report that a memory poisoning kernel caughtthe failure.


Any investigation hints for me?

e1000: implement old xmit_frame

It seems back in the day of 2.6.11, there were no sk_forward_alloc

asserions. Forward port that transmit code to see if it fixes theissues

in today's kernel.  Unfortunately it doesn't have all the bug fixes that
the current code has, but if we get transmit timeouts we can add in
workarounds appropriately.

this changes the e1000_xmit_frame function, and some ancilliaries

Signed-off-by: Jesse Brandeburg <[EMAIL PROTECTED]>




Can't apply this one:

[EMAIL PROTECTED] linux-2.6.16]$ patch -p1 <../../../SOURCES/linux-2.6.16-e1000-implement_old_xmit_frame.patch

patching file drivers/net/e1000/e1000_main.c
Hunk #1 succeeded at 2620 (offset -105 lines).
Hunk #2 FAILED at 2695.
Hunk #4 FAILED at 2837.
Hunk #5 FAILED at 2868.
Hunk #6 FAILED at 2899.

4 out of 6 hunks FAILED -- saving rejects to filedrivers/net/e1000/e1000_main.c.rej

well that seems kind of lame, but I think we got the data that we neededfrom the first patch.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...

Reply via email to