Re: 82574L hangs (with r233708 e1000 driver).

2012-08-09 Thread Barney Cordoba


--- On Fri, 5/11/12, Barney Cordoba  wrote:

> From: Barney Cordoba 
> Subject: Re: 82574L hangs (with r233708 e1000 driver).
> To: "John Baldwin" , "Konstantin Belousov" 
> 
> Cc: j...@freebsd.org, "Jack Vogel" , n...@freebsd.org
> Date: Friday, May 11, 2012, 6:24 PM
> 
> 
> --- On Tue, 5/8/12, Konstantin Belousov 
> wrote:
> 
> > From: Konstantin Belousov 
> > Subject: Re: 82574L hangs (with r233708 e1000 driver).
> > To: "John Baldwin" 
> > Cc: j...@freebsd.org,
> "Jack Vogel" ,
> n...@freebsd.org
> > Date: Tuesday, May 8, 2012, 4:24 AM
> > On Mon, May 07, 2012 at 01:44:57PM
> > -0400, John Baldwin wrote:
> > > On Friday, May 04, 2012 6:18:19 pm Konstantin
> Belousov
> > wrote:
> > > > On Fri, May 04, 2012 at 11:30:22AM -0400,
> John
> > Baldwin wrote:
> > > > > On Tuesday, May 01, 2012 12:21:21 pm
> > Konstantin Belousov wrote:
> > > > > > On Thu, Apr 12, 2012 at 09:38:49PM
> > +0300, Konstantin Belousov wrote:
> > > > > > > On Mon, Apr 09, 2012 at
> 12:19:39PM
> > -0400, John Baldwin wrote:
> > > > > > > > On Sunday, April 08,
> 2012
> > 1:11:25 am Konstantin Belousov wrote:
> > > > > > > > > On Sat, Apr 07, 2012
> at
> > 04:22:07PM -0700, Jack Vogel wrote:
> > > > > > > > > > Make sure you
> have
> > any firmware up to the latest available, if that 
> > > > > doesn't
> > > > > > > > > > help
> > > > > > > > > > let me know and
> I'll
> > check internally to see if there are any 
> > > > > outstanding
> > > > > > > > > > issues
> > > > > > > > > > in shared
> > code,  that will be after the weekend.
> > > > > > > > > 
> > > > > > > > > I had BIOS rev.
> 151,
> > after you hint I found rev. 154 on the site.
> > > > > > > > > Now BIOS reports
> itself
> > as MTCDT10N.86A.0154.2012.0323.1601,
> > > > > > > > > March 23.
> > > > > > > > > 
> > > > > > > > > Unfortunately,
> upgrade
> > did not changed anything in regard of hanging
> > > > > > > > > interface.
> > > > > > > > 
> > > > > > > > Does reverting 233708
> make any
> > difference?  Have you tried futzing 
> > > > > around with
> > > > > > > > kgdb when it is hung to
> see
> > what state the device is in (software state 
> > > > > at
> > > > > > > > least)?
> > > > > > > It does, in a sense that
> without
> > r233708 the interface becomes stuck
> > > > > > > almost immediately. I just
> upgraded
> > to the e1000@r234154, which does not
> > > > > > > change much.
> > > > > > > 
> > > > > > > I fiddled with the adapter
> state
> > after the hang in kgdb more, and I
> > > > > > > noted something interesting.
> > Apparently, tx works. When I ping the remote
> > > > > > > host from my suffering atom
> > machine, remote host sees the packet. Also
> > > > > > > remote machine sees some udp
> > traffic originating from the tom, like
> > > > > > > ntp queries.
> > > > > > > 
> > > > > > > And, on receive, the atom
> board
> > does receive interrupts, em0:rx 0 counter
> > > > > > > in vmstat -i increases. Even
> more
> > fun, the sysctl dev.em.0.debug
> > > > > > > shows increasing hw rdh (as I
> > understand, this is hardware 'last
> > > > > > > received' packet pointer for
> rx
> > ring). So I looked at the packet
> > > > > > > descriptor at hw rdt index,
> and
> > there I see
> > > > > > > (kgdb) p/x ((struct adapter
> > *)0xff80010e4000)->rx_rings->rx_base[78]
> > > > > > > $11 = {buffer_addr =
> 0x12a128800,
> > length = 0x5ea, csum = 0x3c2b, status = 
> > > > > 0x0, 
> > > > > > >   errors = 0x0,
> > special = 0x0}
> > > > > > > 
> > > > > > > Apparently, the Descriptor
> Done bit
> > is clear, so the em_rxeof() function
> > > > > > > breaks from the loop, not
> consuming
> > the current packet. Also, it returns
> > > > > > > false due to DD bit clear.
> This
> > prevents em_msix_rx() from scheduling
> > > > > > > taskqueue for processing. So
> > apparent cause for the hang is missing
> > > > > > > DD bit in descriptor.
> > > > > > > 
> > > > > > > I am not sure isn't all this
> is
> > obvious for anybody who knows em
> > > > > > > internals, and were to go
> from
> > there.
> > > > > > 
> > > > > > Ok, nobody cares.
> > > > > > 
> > > > > > Below is the workaround I use to
> prevent
> > the interface wedging.
> > > > > > It seems that the sole PCI register
> read
> > (namely, the rx ring head read)
> > > > > > and consequent recheck of the
> descriptor
> > status greatly reduce the
> > > > > > likelihood of the issue.
> Unfortunately,
> > the read does not eliminate
> > > > > > the hang completely. So it is not
> some
> > PCIe coherency problem.
> > > > > > 
> > > > > > With the patch applied, I am able
> to
> > copy around blu-ray images, while
> > > > > > previously the interface hang in
> 20-30
> > seconds of 100Mbit/s traffic.
> > > > > > Sometimes the messages are
> printed:
> > > > > > em0: Workaround: head 1018 tail
> 1002 cur
> > 1010
> > > > > > em0: Workaround: head 976 tail 973
> cur
> > 974
> > > > > > em0: Workaround: head 950 tail 939
> cur
> > 946
> > > > > > em0: Workaround: head 435 tail 419
> cur
> > 426
> > > > > > 
> > > > >

Re: 82574L hangs (with r233708 e1000 driver).

2012-08-09 Thread Jason Wolfe
On Thu, Aug 9, 2012 at 8:25 AM, Barney Cordoba  wrote:
>> --- On Fri, 5/11/12, Barney Cordoba  wrote:
>>
>> FWIW, I've got an X7SPE-HF-D525 MB with 82574L running on a
>> 7.0 driver
>> that seems to work pretty well. It panics once in a blue
>> moon when we
>> overload it (like 200Mb/s of traffic) but it generally works
>> ok.
>>
>> BC
>
> Has anything been done or patched regarding this problem?
>
> BC

Ever since r235553 the 82574L has been stable for me, collectively
passing ~1.2Tb/s for the past 4 months without issue.  We did have
some issues with switches not liking the fallout of what r236162 fixed
that we updated to, but the cards themselves were fine.  If you pull
the current e1000 from 8-STABLE you'll get up to r236162.

Jason
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"