From: Jesse Brandeburg <[EMAIL PROTECTED]>
Date: Fri, 14 Apr 2006 15:55:10 -0700 (Pacific Daylight Time)
> I'm trying to isolate more of a reproduction case, I'll be sure to
> post if I can find anything with more detail.
I think I see the bug.
If tbench with large numbers of clients is part of
On Fri, 14 Apr 2006, David S. Miller wrote:
> From: Jesse Brandeburg <[EMAIL PROTECTED]>
> Date: Fri, 14 Apr 2006 15:46:31 -0700 (Pacific Daylight Time)
>
> > sure, thats fine, but we just reproduced it in two seperate systems
> > without the e1000 driver loaded, using the instructions as mentio
From: Jesse Brandeburg <[EMAIL PROTECTED]>
Date: Fri, 14 Apr 2006 15:46:31 -0700 (Pacific Daylight Time)
> sure, thats fine, but we just reproduced it in two seperate systems
> without the e1000 driver loaded, using the instructions as mentioned in a
> previous email. We used a 5704 with TSO en
On Fri, 14 Apr 2006, David S. Miller wrote:
> From: Jesse Brandeburg <[EMAIL PROTECTED]>
> Date: Fri, 14 Apr 2006 15:32:55 -0700 (Pacific Daylight Time)
>
> > well there was one of them here, but the tg3 bit may actually be due to
> > the 2.6.14 problems.
> >
> > http://bugzilla.kernel.org/show
From: Jesse Brandeburg <[EMAIL PROTECTED]>
Date: Fri, 14 Apr 2006 15:32:55 -0700 (Pacific Daylight Time)
> well there was one of them here, but the tg3 bit may actually be due to
> the 2.6.14 problems.
>
> http://bugzilla.kernel.org/show_bug.cgi?id=6279
There are 2 e1000 gigabit devices in that
On Fri, 14 Apr 2006, David S. Miller wrote:
> From: Jesse Brandeburg <[EMAIL PROTECTED]>
> Date: Fri, 14 Apr 2006 13:28:10 -0700
>
> > We also have some new data from the last couple of days. First, I think
> > that this problem is likely not just E1000's fault. We have multiple
> > reports
From: Jesse Brandeburg <[EMAIL PROTECTED]>
Date: Fri, 14 Apr 2006 13:28:10 -0700
> We also have some new data from the last couple of days. First, I think
> that this problem is likely not just E1000's fault. We have multiple
> reports both in bugzilla.kernel.org and from a distro that show th
Boris B. Zhmurov wrote:
Hello, Jesse Brandeburg.
On 06.04.2006 04:42 you said the following:
I built and tested the driver with patches on 2.6.16, with pci-x
adapters. I removed some workarounds for PCIe adapters, but I dont
think anyone having this problem has a PCIe adapter anyway. I saw
Maybe it's unrelated to this problem, but it is interesting observation,
at least for me.
All boxes running for two weeks now and spitting these assert messages
have about 1,5GB of slab size allocated, with skbuff_head_cache entry
being the largest entry. After rebooting, it is all nice and sm
Hello, Jesse Brandeburg.
On 06.04.2006 04:42 you said the following:
I built and tested the driver with patches on 2.6.16, with pci-x adapters.
I removed some workarounds for PCIe adapters, but I dont think anyone
having this problem has a PCIe adapter anyway. I saw no TX hangs and ran
some
On Wed, 5 Apr 2006, Jesse Brandeburg wrote:
> I'll also send a patch today to back-rev the xmit routine to the 5.6.10.1
> state.
I'm in a bit of a hurry, but I wanted to send these debug patches out.
Forgive me if my mailer decides to munge them.
I'd suggest trying the first one and then both
On 4/5/06, Herbert Xu <[EMAIL PROTECTED]> wrote:
> Michal Feix <[EMAIL PROTECTED]> wrote:
> >
> > All boxes running for two weeks now and spitting these assert messages
> > have about 1,5GB of slab size allocated, with skbuff_head_cache entry
> > being the largest entry. After rebooting, it is all
On Thu, Apr 06, 2006 at 04:11:58AM +1000, Herbert Xu wrote:
> That's a very interesting observation. Can others please check if they
> have an abnormally large skbuff slab cache?
Mine seem to be reasonably sized:
e1000 with assertion failures (up 14 days, 2:51):
OBJS ACTIVE USE OBJ SIZE SLA
On Mon, 3 Apr 2006, Boris B. Zhmurov wrote:
>
> Hello, Phil Oester.
>
> On 04.04.2006 01:39 you said the following:
>
> > On Mon, Apr 03, 2006 at 04:01:23PM -0500, Mark Nipper wrote:
> >
> >>After three days and some hours, I finally saw another
> >>event:
> >
> >
> > Ack, same here.
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Thu, 06 Apr 2006 04:11:58 +1000
> Michal Feix <[EMAIL PROTECTED]> wrote:
> >
> > All boxes running for two weeks now and spitting these assert messages
> > have about 1,5GB of slab size allocated, with skbuff_head_cache entry
> > being the largest entr
Michal Feix <[EMAIL PROTECTED]> wrote:
>
> All boxes running for two weeks now and spitting these assert messages
> have about 1,5GB of slab size allocated, with skbuff_head_cache entry
> being the largest entry. After rebooting, it is all nice and small, but
> it is noticable, that this entry
Maybe it's unrelated to this problem, but it is interesting observation,
at least for me.
All boxes running for two weeks now and spitting these assert messages
have about 1,5GB of slab size allocated, with skbuff_head_cache entry
being the largest entry. After rebooting, it is all nice and sm
Hello, Phil Oester.
On 04.04.2006 01:39 you said the following:
On Mon, Apr 03, 2006 at 04:01:23PM -0500, Mark Nipper wrote:
After three days and some hours, I finally saw another
event:
Ack, same here. Looked hopeful, but finally saw the error today.
Phil
[EMAIL PROTECTED] ~]#
On Mon, Apr 03, 2006 at 04:01:23PM -0500, Mark Nipper wrote:
> After three days and some hours, I finally saw another
> event:
Ack, same here. Looked hopeful, but finally saw the error today.
Phil
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a messag
On 31 Mar 2006, Herbert Xu wrote:
> If it still fails, here is a debugging patch which should tell us
> whether we need to look elsewhere.
After three days and some hours, I finally saw another
event:
---
Apr 3 13:40:53 king kernel: KERNEL: assertion (!sk->sk_forward_alloc) failed
at net
On Fri, 31 Mar 2006, Ingo Oeser wrote:
Hi,
Herbert Xu wrote:
On Fri, Mar 31, 2006 at 01:35:40AM -0800, David S. Miller wrote:
He does not have TSO enabled, e1000 disables TSO when on a link speed
slower than gigabit.
dmesg|grep eth0
[4294671.426000] e1000: eth0: e1000_probe: Intel(R) PRO/
Hello, Mark Nipper.
On 31.03.2006 20:01 you said the following:
On 31 Mar 2006, Boris B. Zhmurov wrote:
stream.c (279) -> stream.c (283)
af_inet.c (148) -> af_inet.c (150)
That will be because the patches changed the line numbers
in the source I believe. Nothing helpful unfortunat
On 31 Mar 2006, Boris B. Zhmurov wrote:
> stream.c (279) -> stream.c (283)
> af_inet.c (148) -> af_inet.c (150)
That will be because the patches changed the line numbers
in the source I believe. Nothing helpful unfortunately.
--
Mark Nipper
Hello, Boris B. Zhmurov.
On 31.03.2006 19:08 you said the following:
Hmm... with lastest debug patch I can't see any of debug info:
But wait a minute. Two days ago, without Herbert's patches, assertion's
errors was like this:
Mar 29 20:03:23 msk4 kernel: KERNEL: assertion (!sk->sk_forward_
Hello, Boris B. Zhmurov.
On 31.03.2006 17:30 you said the following:
Herbert, with your second patch still no luck. After an hour of uptime I
have assertion (!sk->sk_forward_alloc) failed at net/core/stream.c (283)
again...
Trying your debug patch.
Hmm... with lastest debug patch I can't
Hello, Christiaan den Besten.
On 31.03.2006 17:12 you said the following:
Hi !
P.S. I have another high-load server as gateway. Same distro, same
kernels, but less memory (512Mb lowmem). eth0 up - e100, eth1 up -
e1000. No errors at all! It kinda looks like assertions happens on
systems, wh
Hello, Boris B. Zhmurov.
On 31.03.2006 16:23 you said the following:
Hello, Mark Nipper.
On 31.03.2006 16:10 you said the following:
This unfortunately is not the case. I have two e1000
interfaces but only eth1 is up and in use. And I still had
assertions.
Can you switch to eth
[EMAIL PROTECTED]>;
<[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>;
<[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>;
; <[EMAIL PROTECTED]>;
<[EMAIL PROTECTED]>; "Andi Kleen" <[EMAIL PROTECTED]>; "
Hello, Herbert Xu.
On 31.03.2006 16:35 you said the following:
On Fri, Mar 31, 2006 at 04:23:02PM +0400, Boris B. Zhmurov wrote:
I'm already using kernel with second Herbert's patch. We'll see...
If it still fails
Not yet. But give it a time :)
--
Boris B. Zhmurov
mailto: [EMAIL PROTEC
On Fri, Mar 31, 2006 at 04:23:02PM +0400, Boris B. Zhmurov wrote:
>
> I'm already using kernel with second Herbert's patch. We'll see...
If it still fails, here is a debugging patch which should tell us
whether we need to look elsewhere.
Thanks,
--
Visit Openswan at http://www.openswan.org/
Ema
Hello, Mark Nipper.
On 31.03.2006 16:10 you said the following:
This unfortunately is not the case. I have two e1000
interfaces but only eth1 is up and in use. And I still had
assertions.
Can you switch to eth0? There is no problem with _eth0_, my friend says.
> And I still had
>
Hi,
Herbert Xu wrote:
> On Fri, Mar 31, 2006 at 01:35:40AM -0800, David S. Miller wrote:
> > He does not have TSO enabled, e1000 disables TSO when on a link speed
> > slower than gigabit.
dmesg|grep eth0
[4294671.426000] e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
[4294679.1250
On Friday 31 March 2006 14:07, Boris B. Zhmurov wrote:
> David, Herbert - FYI. One of my colleague confirmed, that idea "bug
> reproducible only if there is more then one e1000 adapter onboard" is
> true. He has a 3 servers with double intel pro 1000 adapters, and that
> bug occurs. Also, he has 4
On 31 Mar 2006, Boris B. Zhmurov wrote:
> David, Herbert - FYI. One of my colleague confirmed, that idea "bug
> reproducible only if there is more then one e1000 adapter onboard" is
> true. He has a 3 servers with double intel pro 1000 adapters, and that
> bug occurs. Also, he has 4 servers with
Hello, Herbert Xu.
On 31.03.2006 14:39 you said the following:
On Fri, Mar 31, 2006 at 02:16:38PM +0400, Boris B. Zhmurov wrote:
And xdelta tells, that e1000.ko was modified :)
Thanks for checking again.
Anyway, it didn't take long to find another bug in the same area.
I'm afraid this dri
IL PROTECTED]>;
<[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>;
<[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>;
; <[EMAIL PROTECTED]>;
<[EMAIL PROTECTED]>
Sent: Friday, March 31, 2006 11:42 AM
Subject: Re: [e1000 debug]
Hello, Herbert Xu.
On 31.03.2006 14:52 you said the following:
BTW, if you kept the built tree it is possible to apply the patch and
then do a make which should compile just the e1000 driver.
Cheers,
Thank's for the tip, actually I knew that :) First of, I've already
applied some other new
On 31 Mar 2006, David S. Miller wrote:
> He does not have TSO enabled, e1000 disables TSO when on a link speed
> slower than gigabit.
>
> You'll see something like the following in your logs:
>
> e1000: eth0: e1000_watchdog_task: 10/100 speed: disabling TSO
Um...
---
$ uname -a
Linux kin
Hello, David S. Miller.
On 31.03.2006 14:45 you said the following:
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Fri, 31 Mar 2006 21:39:56 +1100
Anyway, it didn't take long to find another bug in the same area.
I'm afraid this driver does seem to be full of them :)
Indeed.
Thanks for picki
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Fri, 31 Mar 2006 21:39:56 +1100
> Anyway, it didn't take long to find another bug in the same area.
> I'm afraid this driver does seem to be full of them :)
Indeed.
Thanks for picking through this some more Herbert. I hope we got it
this time.
-
To uns
On Fri, Mar 31, 2006 at 02:16:38PM +0400, Boris B. Zhmurov wrote:
>
> And xdelta tells, that e1000.ko was modified :)
Thanks for checking again.
Anyway, it didn't take long to find another bug in the same area.
I'm afraid this driver does seem to be full of them :)
It sets last_tx_tso in betwee
Hello, David S. Miller.
On 31.03.2006 13:12 you said the following:
From: "Boris B. Zhmurov" <[EMAIL PROTECTED]>
Date: Thu, 30 Mar 2006 17:29:09 +0400
Hello, Herbert Xu.
On 30.03.2006 14:12 you said the following:
On Thu, Mar 30, 2006 at 10:02:01AM +, Boris B. Zhmurov wrote:
[EMAIL
On Fri, Mar 31, 2006 at 01:35:40AM -0800, David S. Miller wrote:
>
> He does not have TSO enabled, e1000 disables TSO when on a link speed
> slower than gigabit.
Indeed. But I think that only happens on PCI Express and I don't think
Ingo is using PCI Express.
Cheers,
--
Visit Openswan at http:
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Fri, 31 Mar 2006 20:16:53 +1100
> Ingo Oeser <[EMAIL PROTECTED]> wrote:
> >
> > More datapoints.
> >
> > First of all, I don't see the problem, so this is an exclusion data point.
>
> Great. I think so far all the configurations that have this problem
Ingo Oeser <[EMAIL PROTECTED]> wrote:
>
> More datapoints.
>
> First of all, I don't see the problem, so this is an exclusion data point.
Great. I think so far all the configurations that have this problem
are
e1000 + SMP + TSO
Since your machine is not SMP but has the other two things it wou
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Thu, 30 Mar 2006 20:52:45 +1100
> Well I started from the beginning again, and found this. This may be
> the smoking gun that we're after :)
>
> The xmit routine is lockless but checks last_tx_tso outside the locked
> section. So if a TSO packet wins a
From: "Boris B. Zhmurov" <[EMAIL PROTECTED]>
Date: Thu, 30 Mar 2006 17:29:09 +0400
> Hello, Herbert Xu.
>
> On 30.03.2006 14:12 you said the following:
>
> > On Thu, Mar 30, 2006 at 10:02:01AM +, Boris B. Zhmurov wrote:
> >
> >>[EMAIL PROTECTED] linux-2.6.16]$ patch -p1 <
> >>../../../SOUR
From: Ingo Oeser <[EMAIL PROTECTED]>
Date: Fri, 31 Mar 2006 10:57:06 +0200
> Hi Jesse,
>
> More datapoints.
>
> First of all, I don't see the problem, so this is an exclusion data point.
>
> Machine is up 1 day, 19:02
>
> I use 2.6.16 and I'm NBOT running at Gigabit speed.
If you're not runni
Hi Jesse,
More datapoints.
First of all, I don't see the problem, so this is an exclusion data point.
Machine is up 1 day, 19:02
I use 2.6.16 and I'm NBOT running at Gigabit speed.
(just couldn't get e100 cards anymore, they are not sold anymore here)
Version: vendor 00:aa:00, model 56 rev 0
On Thu, 30 Mar 2006, Phil Oester wrote:
On 29 Mar 2006, Brandeburg, Jesse wrote:
What I need from you is a reproducible test, and some information. I
From all the reports which have come in thus far, it seems everyone
has > 1 e1000. One person even reported that removing one of the two
nic
> On 29 Mar 2006, Brandeburg, Jesse wrote:
> What I need from you is a reproducible test, and some information. I
>From all the reports which have come in thus far, it seems everyone
has > 1 e1000. One person even reported that removing one of the two
nics solved the problem for him. Does this
Hello, Herbert Xu.
On 30.03.2006 14:12 you said the following:
On Thu, Mar 30, 2006 at 10:02:01AM +, Boris B. Zhmurov wrote:
[EMAIL PROTECTED] linux-2.6.16]$ patch -p1 <
../../../SOURCES/linux-2.6.16-e1000-try-to-fix-assertion_sk_forward_alloc_failed_by_Herbert_Xu.patch
patching file d
CTED]>; <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>;
<[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>;
<[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>;
; <[EMAIL PROTECTED]>;
<[EMAIL PROTECTED]>
Sent: Thur
On Thu, 30 Mar 2006, Mark Nipper wrote:
On 29 Mar 2006, Brandeburg, Jesse wrote:
What I need from you is a reproducible test, and some information. I
have never been able to reproduce this, and I'm trying to isolate the
problem a bit. What motherboards are you using? What seems to cause
th
On Wed, 29 Mar 2006, Brandeburg, Jesse wrote:
Hi all, I've identified you as people who have at some point in the past
emailed one of the Linux lists with problems with e1000 and
sk_forward_alloc. It seems to be fairly widespread, but only seems to
have appeared with recent kernel changes (af
On Thu, Mar 30, 2006 at 10:02:01AM +, Boris B. Zhmurov wrote:
>
> [EMAIL PROTECTED] linux-2.6.16]$ patch -p1 <
> ../../../SOURCES/linux-2.6.16-e1000-try-to-fix-assertion_sk_forward_alloc_failed_by_Herbert_Xu.patch
>
>
> patching file drivers/net/e1000/e1000_main.c
> Reversed (or previously
Hello, Herbert Xu.
On 30.03.2006 13:52 you said the following:
On Wed, Mar 29, 2006 at 08:44:09PM -0800, David S. Miller wrote:
Herbert do you see any holes here?
Well I started from the beginning again, and found this. This may be
the smoking gun that we're after :)
The xmit routine is
On Wed, Mar 29, 2006 at 08:44:09PM -0800, David S. Miller wrote:
>
> Herbert do you see any holes here?
Well I started from the beginning again, and found this. This may be
the smoking gun that we're after :)
The xmit routine is lockless but checks last_tx_tso outside the locked
section. So if
Hi,
>What seems to cause this problem?
That I cannot say but the problem was fixed by removing one e1000 card
from the server (I initially had two e1000 cards installed in addition
to the two tg3 cards on the board).
Another fix was to disable TSO with ethtool.
>What motherboards are you using?
Hello, Brandeburg, Jesse.
On 30.03.2006 06:53 you said the following:
Hi all, I've identified you as people who have at some point in the past
emailed one of the Linux lists with problems with e1000 and
sk_forward_alloc. It seems to be fairly widespread, but only seems to
have appeared with re
On 29 Mar 2006, Brandeburg, Jesse wrote:
> What I need from you is a reproducible test, and some information. I
> have never been able to reproduce this, and I'm trying to isolate the
> problem a bit. What motherboards are you using? What seems to cause
> this problem? Are you all using iptable
>
Cc: ; "Jesse Brandeburg" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>;
<[EMAIL PROTECTED]>; "Brandeburg, Jesse" <[EMAIL PROTECTED]>
Sent: Thursday, March 30, 2006 4:53 AM
Subject: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
Hi all, I
From: "Brandeburg, Jesse" <[EMAIL PROTECTED]>
Date: Wed, 29 Mar 2006 18:53:57 -0800
> To do this we have code like so in e1000_tso:
> 2529 if (skb_shinfo(skb)->tso_size) {
> 2530 if (skb_header_cloned(skb)) {
> 2531 err = pskb_expand_head(skb, 0, 0,
On Wed, Mar 29, 2006 at 06:53:57PM -0800, Brandeburg, Jesse wrote:
> Hi all, I've identified you as people who have at some point in the past
> emailed one of the Linux lists with problems with e1000 and
> sk_forward_alloc. It seems to be fairly widespread, but only seems to
> have appeared with r
Hi Jesse,
Thanks for your concern,
My server still send warning message regarding this KERNEL: assertion
(!sk_forward_alloc) after upgrade kernel 2.6.12 or 2.6.15.
This is from dmesg server:
Linux version 2.6.15.4 ([EMAIL PROTECTED]) (gcc version 3.3.4 (Debian
1:3.3.4-13)) #1 SMP Tue Feb 21 17
Hi all, I've identified you as people who have at some point in the past
emailed one of the Linux lists with problems with e1000 and
sk_forward_alloc. It seems to be fairly widespread, but only seems to
have appeared with recent kernel changes (after 2.6.12...)
What I need from you is a reproduci
66 matches
Mail list logo