On Tue, Dec 19, 2017 at 12:52 PM, Andrew Lunn <and...@lunn.ch> wrote: > On Mon, Dec 18, 2017 at 01:53:47PM -0800, Tim Harvey wrote: >> On Wed, Dec 13, 2017 at 11:43 AM, Andrew Lunn <and...@lunn.ch> wrote: >> >> The nic appears to work fine (pings, TCP etc) up until a performance >> >> test is attempted. >> >> When an iperf bandwidth test is attempted the nic ends up in a state >> >> where truncated-ip packets are being sent out (per a tcpdump from >> >> another board): >> > >> > Hi Tim >> > >> > Are pause frames supported? Have you tried turning them off? >> > >> > Can you reproduce the issue with UDP? Or is it TCP only? >> > >> >> Andrew, >> >> Pause frames don't appear to be supported yet and the issue occurs >> when using UDP as well as TCP. I'm not clear what the best way to >> troubleshoot this is. > > Hi Tim > > Is pause being negotiated? In theory, it should not be. The PHY should > not offer it, if the MAC has not enabled it. But some PHY drivers are > probably broken and offer pause when they should not. > > Also, can you trigger the issue using UDP at say 75% the maximum > bandwidth. That should be low enough that the peer never even tries to > use pause. > > All this pause stuff is just a stab in the dark. Something else to try > is to turn off various forms off acceleration, ethtook -K, and see if > that makes a difference. >
Andrew, Currently I'm not using the DP83867_PHY driver (after verifying the issue occurs with or without that driver). It does not occur if I limit UDP (ie 950mbps). I disabled all offloads and the issue still occurs. I have found that once the issue occurs I can recover to a working state by clearing/setting BGX_CMRX_CFG[BGX_EN] and once I encounter the issue and recover with that, I can never trigger the issue again. If toggle that register bit upon power-up before the issue occurs it will still occur. The CN80XX reference manual describes BGX_CMRX_CFG[BGX_EN] as: - when cleared all dedicated BGX context state for LMAC (state machine, FIFOs, counters etc) are reset and LMAC access to shared BGX resources (data path, serdes lanes) is disabled - when set LMAC operation is enabled (link bring-up, sync, and tx/rx of idles and fault sequences) I'm told that the particular Cavium reference board with an SGMII phy doesn't show this issue (I don't have that specific board to do my own testing or comparisons against our board) so I'm inclined to think it has something to do with an interaction with the DP83867 PHY. I would like to start poking at PHY registers to see if I can find anything unusual. The best way to do that from userspace is via SIOCGMIIREG/SIOCSMIIREG right? The thunderx nic doesn't currently support ioctl's so I guess I'll have to add that support unless there's a way to get at phy registers from userspace through a phy driver? Regards, Tim