new em-driver still broken (was: Re: em network issues)

2006-10-27 Thread Mikhail Teterin
On Saturday 21 October 2006 13:33, Gleb Smirnoff wrote: = We aren't currently speaking about performance, we need to know whether = kernel with DEVICE_POLLING option makes NIC work stable. Having noticed today's em-driver update, I rebuilt world/kernel and tried the dump-test again. The kernel ha

Re: em network issues

2006-10-25 Thread Scott Long
Jack Vogel wrote: On 10/25/06, Scott Long <[EMAIL PROTECTED]> wrote: Jack Vogel wrote: > On 10/25/06, Doug Ambrisko <[EMAIL PROTECTED]> wrote: > >> 3) In em_process_receive_interrupts/em_rxeof always decrement >> the count on every run through the loop. If you notice >> cou

Re: em network issues

2006-10-25 Thread Bruce Evans
On Wed, 25 Oct 2006, Doug Ambrisko wrote: John Polstra writes: | On 19-Oct-2006 Scott Long wrote: | > The performance measurements that Andre and I did early this year showed | > that the INTR_FAST handler provided a very large benefit. | | I'm trying to understand why that's the case. Is it bec

Re: em network issues

2006-10-25 Thread Jack Vogel
On 10/25/06, Scott Long <[EMAIL PROTECTED]> wrote: Jack Vogel wrote: > On 10/25/06, Doug Ambrisko <[EMAIL PROTECTED]> wrote: > >> 3) In em_process_receive_interrupts/em_rxeof always decrement >> the count on every run through the loop. If you notice >> count is an is an int

Re: em network issues

2006-10-25 Thread Scott Long
Jack Vogel wrote: On 10/25/06, Doug Ambrisko <[EMAIL PROTECTED]> wrote: 3) In em_process_receive_interrupts/em_rxeof always decrement the count on every run through the loop. If you notice count is an is an int that starts at the passed in value of -1. It then cou

Re: em network issues

2006-10-25 Thread Jack Vogel
On 10/25/06, Doug Ambrisko <[EMAIL PROTECTED]> wrote: 3) In em_process_receive_interrupts/em_rxeof always decrement the count on every run through the loop. If you notice count is an is an int that starts at the passed in value of -1. It then count-- until count==0

Re: em network issues

2006-10-25 Thread Scott Long
Doug Ambrisko wrote: John Polstra writes: | On 19-Oct-2006 Scott Long wrote: | > The performance measurements that Andre and I did early this year showed | > that the INTR_FAST handler provided a very large benefit. | | I'm trying to understand why that's the case. Is it because an | INTR_FAST

Re: em network issues

2006-10-25 Thread Jack Vogel
On 10/25/06, Doug Ambrisko <[EMAIL PROTECTED]> wrote: John Polstra writes: | On 19-Oct-2006 Scott Long wrote: | > The performance measurements that Andre and I did early this year showed | > that the INTR_FAST handler provided a very large benefit. | | I'm trying to understand why that's the case

Re: em network issues

2006-10-25 Thread Doug Ambrisko
John Polstra writes: | On 19-Oct-2006 Scott Long wrote: | > The performance measurements that Andre and I did early this year showed | > that the INTR_FAST handler provided a very large benefit. | | I'm trying to understand why that's the case. Is it because an | INTR_FAST interrupt doesn't have

Re: em network issues

2006-10-23 Thread Mikhail Teterin
понеділок 23 жовтень 2006 13:37, Mikhail Teterin написав: > > We aren't currently speaking about performance, we need to know whether > > kernel with DEVICE_POLLING option makes NIC work stable. > > Yes, that seems to be the case... I spoke too soon :-( It took a lot longer this time (without poll

Re: em network issues

2006-10-23 Thread Mikhail Teterin
субота 21 жовтень 2006 13:33, Gleb Smirnoff написав: > We aren't currently speaking about performance, we need to know whether > kernel with DEVICE_POLLING option makes NIC work stable. Yes, that seems to be the case... After I got to the machine's console (there was no network access) and turning

Re: em network issues

2006-10-21 Thread Mikhail Teterin
= I'd appreciate if people who are observing the problem will report = whether adding DEVICE_POLLING option to kernel config helps them = or not. This will help to tell whether the problem is in the above = quote or in the import of new versions from vendor. I tried this yesterday -- before writin

Re: em network issues

2006-10-21 Thread Gleb Smirnoff
On Sat, Oct 21, 2006 at 01:00:08PM -0400, Mikhail Teterin wrote: M> = I'd appreciate if people who are observing the problem will report M> = whether adding DEVICE_POLLING option to kernel config helps them M> = or not. This will help to tell whether the problem is in the above M> = quote or in the

Re: em network issues

2006-10-19 Thread Scott Long
Bruce Evans wrote: On Thu, 19 Oct 2006, John Polstra wrote: On 19-Oct-2006 Scott Long wrote: The performance measurements that Andre and I did early this year showed that the INTR_FAST handler provided a very large benefit. I'm trying to understand why that's the case. Is it because an INTR

Re: em network issues

2006-10-19 Thread Scott Long
John Polstra wrote: On 19-Oct-2006 Scott Long wrote: The performance measurements that Andre and I did early this year showed that the INTR_FAST handler provided a very large benefit. I'm trying to understand why that's the case. Is it because an INTR_FAST interrupt doesn't have to be masked

Re: em network issues

2006-10-19 Thread Bruce Evans
On Thu, 19 Oct 2006, John Polstra wrote: On 19-Oct-2006 Scott Long wrote: The performance measurements that Andre and I did early this year showed that the INTR_FAST handler provided a very large benefit. I'm trying to understand why that's the case. Is it because an INTR_FAST interrupt does

Re: em network issues

2006-10-19 Thread John Polstra
On 19-Oct-2006 Scott Long wrote: > The performance measurements that Andre and I did early this year showed > that the INTR_FAST handler provided a very large benefit. I'm trying to understand why that's the case. Is it because an INTR_FAST interrupt doesn't have to be masked and unmasked in the

Re: em network issues

2006-10-19 Thread Remko Lodder
Jack Vogel wrote: On 10/19/06, Remko Lodder <[EMAIL PROTECTED]> wrote: Kip Macy wrote: > > On Wed, 18 Oct 2006, Jack Vogel wrote: >> I'm a bit confused from the way you worded this, do you have watchdogs >> with em, or you use em to avoid them? > > I have watchdogs with the current (post vendor

Re: em network issues

2006-10-19 Thread Jack Vogel
On 10/19/06, Remko Lodder <[EMAIL PROTECTED]> wrote: Kip Macy wrote: > > On Wed, 18 Oct 2006, Jack Vogel wrote: >> I'm a bit confused from the way you worded this, do you have watchdogs >> with em, or you use em to avoid them? > > I have watchdogs with the current (post vendor update) em driver,

Re: em network issues

2006-10-19 Thread Remko Lodder
Kip Macy wrote: On Wed, 18 Oct 2006, Jack Vogel wrote: I'm a bit confused from the way you worded this, do you have watchdogs with em, or you use em to avoid them? I have watchdogs with the current (post vendor update) em driver, but not with an older (pre vendor update) version of it. Sam

Re: em network issues

2006-10-19 Thread Bruce Evans
On Thu, 19 Oct 2006, Scott Long wrote: Bruce Evans wrote: On Thu, 19 Oct 2006, Scott Long wrote: Can you be more specific as to the 'bad things'? Not very. Maybe interrupts don't get reenabled as intended. Then the symptoms get mutated by watchdog timeouts. Then yes, I'm already thinking

Re: em network issues

2006-10-19 Thread Scott Long
Bruce Evans wrote: On Thu, 19 Oct 2006, Scott Long wrote: Bruce Evans wrote: On Wed, 18 Oct 2006, Kris Kennaway wrote: I have been working with someone's system that has em shared with fxp, and a simple fetch over the em (e.g. of a 10 GB file of zeroes) is enough to produce watchdog timeou

Re: em network issues

2006-10-19 Thread Bruce Evans
On Thu, 19 Oct 2006, Scott Long wrote: Bruce Evans wrote: On Wed, 18 Oct 2006, Kris Kennaway wrote: I have been working with someone's system that has em shared with fxp, and a simple fetch over the em (e.g. of a 10 GB file of zeroes) is enough to produce watchdog timeouts after a few second

Re: em network issues

2006-10-18 Thread Scott Long
Bruce Evans wrote: On Wed, 18 Oct 2006, Scott Long wrote: [too much quoted; much deleted] Bruce Evans wrote: On Wed, 18 Oct 2006, Kris Kennaway wrote: I have been working with someone's system that has em shared with fxp, and a simple fetch over the em (e.g. of a 10 GB file of zeroes) is en

Re: em network issues

2006-10-18 Thread Bruce Evans
On Wed, 18 Oct 2006, Scott Long wrote: [too much quoted; much deleted] Bruce Evans wrote: On Wed, 18 Oct 2006, Kris Kennaway wrote: I have been working with someone's system that has em shared with fxp, and a simple fetch over the em (e.g. of a 10 GB file of zeroes) is enough to produce watc

Re: em network issues

2006-10-18 Thread Scott Long
Bruce Evans wrote: On Wed, 18 Oct 2006, Kris Kennaway wrote: I have been working with someone's system that has em shared with fxp, and a simple fetch over the em (e.g. of a 10 GB file of zeroes) is enough to produce watchdog timeouts after a few seconds. As previously mentioned, changing the

Re: em network issues

2006-10-18 Thread Bruce Evans
On Wed, 18 Oct 2006, Kris Kennaway wrote: I have been working with someone's system that has em shared with fxp, and a simple fetch over the em (e.g. of a 10 GB file of zeroes) is enough to produce watchdog timeouts after a few seconds. As previously mentioned, changing the INTR_FAST to INTR_MP

Re: em network issues

2006-10-18 Thread Kip Macy
On Wed, 18 Oct 2006, Jack Vogel wrote: > On 10/18/06, Kip Macy <[EMAIL PROTECTED]> wrote: > > I have a Sun T2000 that I generally run with the em driver from as of > > July in order to avoid watchdog timeouts. One trivial scenario that > > reproduces the problem with 100% consistency is running

Re: em network issues

2006-10-18 Thread Jack Vogel
On 10/18/06, Albert Shih <[EMAIL PROTECTED]> wrote: > > There is also a hardware eeprom issue on systems with an 82573 > type NIC on SOME systems. There is a utility to fix that, if you and on HP ? your system does not have 573 NICs, (what you show are 546) do you have others that are? Jack _

Re: em network issues

2006-10-18 Thread Jack Vogel
Awesome, this is the kind of data that will help. I'll see what I can do to get something repro'd. Jack On 10/18/06, Albert Shih <[EMAIL PROTECTED]> wrote: Le 19/10/2006 01:03:40+0200, Albert Shih a écrit > Le 18/10/2006 10:46:30-0700, Jack Vogel a écrit > > I think there may be a few diff

Re: em network issues

2006-10-18 Thread Albert Shih
Le 19/10/2006 01:03:40+0200, Albert Shih a ?crit > Le 18/10/2006 10:46:30-0700, Jack Vogel a ?crit > > I think there may be a few different problems going on with the em driver > > on 6.2 that are being lumped under the general description of network > > hangs. In order to solve these I need a

Re: em network issues

2006-10-18 Thread Albert Shih
Le 18/10/2006 10:46:30-0700, Jack Vogel a ?crit > I think there may be a few different problems going on with the em driver > on 6.2 that are being lumped under the general description of network > hangs. In order to solve these I need a reproducible failure, either on a > system here at Intel, o

Re: em network issues

2006-10-18 Thread Kris Kennaway
On Wed, Oct 18, 2006 at 03:31:53PM -0700, Jack Vogel wrote: > On 10/18/06, Kip Macy <[EMAIL PROTECTED]> wrote: > >I have a Sun T2000 that I generally run with the em driver from as of > >July in order to avoid watchdog timeouts. One trivial scenario that > >reproduces the problem with 100% consiste

Re: em network issues

2006-10-18 Thread Jack Vogel
On 10/18/06, Kip Macy <[EMAIL PROTECTED]> wrote: I have a Sun T2000 that I generally run with the em driver from as of July in order to avoid watchdog timeouts. One trivial scenario that reproduces the problem with 100% consistency is running the ghc configure script (a 20kloc shell script) over

Re: em network issues

2006-10-18 Thread Kip Macy
I have a Sun T2000 that I generally run with the em driver from as of July in order to avoid watchdog timeouts. One trivial scenario that reproduces the problem with 100% consistency is running the ghc configure script (a 20kloc shell script) over NFS. As the T2000 doesn't exactly represent "typic

em network issues

2006-10-18 Thread Jack Vogel
I think there may be a few different problems going on with the em driver on 6.2 that are being lumped under the general description of network hangs. In order to solve these I need a reproducible failure, either on a system here at Intel, or someone who is willing to be a remote guinea pig :) I