First I think the spin_locks in the irq handlers should be spin_lock_irqsave(), because the same lock is used in multiple irq handlers. If we get an rx interrupt while the tx interrupt holds the spin lock, this would seem to be a problem. In this case maybe not because it is a single processor system and spin_locks should compile to nothing(I haven't verified this), and the rx and tx handlers don't really touch any common data elements. I haven't tested changing this, because I've currently running a long test.
On another front, I put some time stamp tracing into the mpc52xx_fec_start_xmit, and verified that the delay is happening after the packet is added the the BestComm ring buffer. There will be 3 quick calls to the xmit, but I'll only see 2 packets at the PC, until 200 - 400 ms later, when I'll get another xmit call (for the retransmit), and then get two duplicate packets at pc. Attempting to add time stamping to the TX irq handler have revealed this to be a Heisenbug of sorts. After the following changes, I haven't seen any delays two hours of running. Previously every minute of so. I'll let it run over night and see if I see an additional delays. Next I'll remove the timestamp code, and attempt to capture the state of the ring buffer and BestComm at the point the retransmit packet is handed off to the driver. The delayed packet has to be somewhere at that point. I could be in the FEC Queue, as I don't think I've seen a delayed packet larger than 1k. @@ -382,6 +414,8 @@ dev_kfree_skb_irq(skb); } spin_unlock(&priv->lock); + js_irq_timestamps[js_irq_idx] = get_tbl(); + js_irq_idx = (js_irq_idx+1 == TS_COUNT)? 0 : js_irq_idx+1; netif_wake_queue(dev); @@ -409,6 +443,7 @@ Joey Nelson On Fri, Jan 27, 2012 at 12:14 PM, Joey Nelson <j...@joescan.com> wrote: > > > In my application, I have a PC connected through TCP to a MPC5200B based > system. The PC sends a short request, the MPC5200B receives the request and > sends the data back. It is doing this about 300 times per second. Normally > exchange happens in just handful of milliseconds. But randomly every 2 to 15 > minutes the MPC5200B sends all but the last packet of the response, and about > 200ms later the PC sends a delayed ACK, and the MPC5200B TCP stack figures > the packet was lost. It then sends two nearly identical packets (The IP > header Identification and Checksum fields are incremented). I can also see > that RetransSegs in /proc/net/snmp increments by one for each of these delays. > > My theory is that the packet is getting suck somewhere in the network stack > (most likely toward the bottom). Then when another packet is sent, the suck > one gets pushed out. > > I've done a test where I have another task on the MPC5200B sending UDP > packets to a different PC every 10ms. This eliminated the long delays, and > seems to support my stuck packet theory. > > I'm seeing the same issue with 2.6.23 and 3.1.6. > > I'm getting ready to dive into the hairy world of Bestcomm and FEC, but I > figured I'd see if anyone else has any suggestions before I make my decent. > Has anyone seen this behavior before? Any likely candidates for where the > packet is getting stuck? General advice for reference materials (I've > started on Linux Device Drivers 3rd Ed, BestComm AN2604, and the Datasheets) > > Thanks in advance. > > Joey Nelson > j...@joescan.com > _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev