--- On Tue, 5/8/12, Konstantin Belousov <kostik...@gmail.com> wrote:
> From: Konstantin Belousov <kostik...@gmail.com> > Subject: Re: 82574L hangs (with r233708 e1000 driver). > To: "John Baldwin" <j...@freebsd.org> > Cc: j...@freebsd.org, "Jack Vogel" <jfvo...@gmail.com>, n...@freebsd.org > Date: Tuesday, May 8, 2012, 4:24 AM > On Mon, May 07, 2012 at 01:44:57PM > -0400, John Baldwin wrote: > > On Friday, May 04, 2012 6:18:19 pm Konstantin Belousov > wrote: > > > On Fri, May 04, 2012 at 11:30:22AM -0400, John > Baldwin wrote: > > > > On Tuesday, May 01, 2012 12:21:21 pm > Konstantin Belousov wrote: > > > > > On Thu, Apr 12, 2012 at 09:38:49PM > +0300, Konstantin Belousov wrote: > > > > > > On Mon, Apr 09, 2012 at 12:19:39PM > -0400, John Baldwin wrote: > > > > > > > On Sunday, April 08, 2012 > 1:11:25 am Konstantin Belousov wrote: > > > > > > > > On Sat, Apr 07, 2012 at > 04:22:07PM -0700, Jack Vogel wrote: > > > > > > > > > Make sure you have > any firmware up to the latest available, if that > > > > doesn't > > > > > > > > > help > > > > > > > > > let me know and I'll > check internally to see if there are any > > > > outstanding > > > > > > > > > issues > > > > > > > > > in shared > code, that will be after the weekend. > > > > > > > > > > > > > > > > I had BIOS rev. 151, > after you hint I found rev. 154 on the site. > > > > > > > > Now BIOS reports itself > as MTCDT10N.86A.0154.2012.0323.1601, > > > > > > > > March 23. > > > > > > > > > > > > > > > > Unfortunately, upgrade > did not changed anything in regard of hanging > > > > > > > > interface. > > > > > > > > > > > > > > Does reverting 233708 make any > difference? Have you tried futzing > > > > around with > > > > > > > kgdb when it is hung to see > what state the device is in (software state > > > > at > > > > > > > least)? > > > > > > It does, in a sense that without > r233708 the interface becomes stuck > > > > > > almost immediately. I just upgraded > to the e1000@r234154, which does not > > > > > > change much. > > > > > > > > > > > > I fiddled with the adapter state > after the hang in kgdb more, and I > > > > > > noted something interesting. > Apparently, tx works. When I ping the remote > > > > > > host from my suffering atom > machine, remote host sees the packet. Also > > > > > > remote machine sees some udp > traffic originating from the tom, like > > > > > > ntp queries. > > > > > > > > > > > > And, on receive, the atom board > does receive interrupts, em0:rx 0 counter > > > > > > in vmstat -i increases. Even more > fun, the sysctl dev.em.0.debug > > > > > > shows increasing hw rdh (as I > understand, this is hardware 'last > > > > > > received' packet pointer for rx > ring). So I looked at the packet > > > > > > descriptor at hw rdt index, and > there I see > > > > > > (kgdb) p/x ((struct adapter > *)0xffffff80010e4000)->rx_rings->rx_base[78] > > > > > > $11 = {buffer_addr = 0x12a128800, > length = 0x5ea, csum = 0x3c2b, status = > > > > 0x0, > > > > > > errors = 0x0, > special = 0x0} > > > > > > > > > > > > Apparently, the Descriptor Done bit > is clear, so the em_rxeof() function > > > > > > breaks from the loop, not consuming > the current packet. Also, it returns > > > > > > false due to DD bit clear. This > prevents em_msix_rx() from scheduling > > > > > > taskqueue for processing. So > apparent cause for the hang is missing > > > > > > DD bit in descriptor. > > > > > > > > > > > > I am not sure isn't all this is > obvious for anybody who knows em > > > > > > internals, and were to go from > there. > > > > > > > > > > Ok, nobody cares. > > > > > > > > > > Below is the workaround I use to prevent > the interface wedging. > > > > > It seems that the sole PCI register read > (namely, the rx ring head read) > > > > > and consequent recheck of the descriptor > status greatly reduce the > > > > > likelihood of the issue. Unfortunately, > the read does not eliminate > > > > > the hang completely. So it is not some > PCIe coherency problem. > > > > > > > > > > With the patch applied, I am able to > copy around blu-ray images, while > > > > > previously the interface hang in 20-30 > seconds of 100Mbit/s traffic. > > > > > Sometimes the messages are printed: > > > > > em0: Workaround: head 1018 tail 1002 cur > 1010 > > > > > em0: Workaround: head 976 tail 973 cur > 974 > > > > > em0: Workaround: head 950 tail 939 cur > 946 > > > > > em0: Workaround: head 435 tail 419 cur > 426 > > > > > > > > > > Machine is still dead due to random > memory corruption which I see, in > > > > > particular, pmap sometimes read garbage > from PTEs. I have no idea is > > > > > it related to em0 rx descriptor missed > writes, or is a different issue. > > > > > > > > Humm, so if I'm reading this correctly, the > card "skips" a receive > > > > descriptor and stores a packet at the next > descriptor? That's just > > > > bizarre. > > > Either this, or it does store the packet but > 'forgots' to update the > > > rx descriptor. I think that your interpretation is > closer to reality, > > > since I get sustained 20MB/s over ssh with the > patch even when workaround > > > activates. The lost packets probably should cause > retransmit and speed > > > drop. > > > > This is just weird. I wonder if there is a known > errata for this? > > This really seems to be broken hardware and not a > driver issue. > I was not able to find anything even remotely resembling the > described > behaviour, in the publically available 82574L specification > update. I looked > at rev. 3.5, dated January 2012. > > I may indeed give up and relocate the hardware into trash, > but it would be > pity, since this is new shiny Intel Atom 2800 m/b. I am not > sure I can give > convincing arguments to supplier for warranty replacement. > > And, while I booted Debian to apply f/w fix Jack > recommended, I did > quick test and interface looked stable. > > FWIW, I've got an X7SPE-HF-D525 MB with 82574L running on a 7.0 driver that seems to work pretty well. It panics once in a blue moon when we overload it (like 200Mb/s of traffic) but it generally works ok. BC _______________________________________________ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"