> -----Original Message----- > From: Joel Stanley [mailto:j...@jms.id.au] > Sent: Thursday, October 15, 2020 6:31 AM > To: Dylan Hung <dylan_h...@aspeedtech.com> > Cc: David S . Miller <da...@davemloft.net>; Jakub Kicinski > <k...@kernel.org>; netdev@vger.kernel.org; Linux Kernel Mailing List > <linux-ker...@vger.kernel.org>; Po-Yu Chuang <ratb...@faraday-tech.com>; > linux-aspeed <linux-asp...@lists.ozlabs.org>; OpenBMC Maillist > <open...@lists.ozlabs.org>; BMC-SW <bmc...@aspeedtech.com> > Subject: Re: [PATCH 1/1] net: ftgmac100: Fix Aspeed ast2600 TX hang issue > > On Wed, 14 Oct 2020 at 13:32, Dylan Hung <dylan_h...@aspeedtech.com> > wrote: > > > > The new HW arbitration feature on Aspeed ast2600 will cause MAC TX > > > > to hang when handling scatter-gather DMA. Disable the problematic > > > > feature by setting MAC register 0x58 bit28 and bit27. > > > > > > Hi Dylan, > > > > > > What are the symptoms of this issue? We are seeing this on our systems: > > > > > > [29376.090637] WARNING: CPU: 0 PID: 9 at net/sched/sch_generic.c:442 > > > dev_watchdog+0x2f0/0x2f4 > > > [29376.099898] NETDEV WATCHDOG: eth0 (ftgmac100): transmit queue 0 > > > timed out > > > > > > > May I know your soc version? This issue happens on ast2600 version A1. > The registers to fix this issue are meaningless/reserved on A0 chip, so it is > okay to set them on either A0 or A1. > > We are running the A1. All of our A0 parts have been replaced with A1. > > > I was encountering this issue when I was running the iperf TX test. The > symptom is the TX descriptors are consumed, but no complete packet is sent > out. > > What parameters are you using for iperf? I did a lot of testing with > iperf3 (and stress-ng running at the same time) and couldn't reproduce the > error. >
I simply use "iperf -c <server ip>" on ast2600. It is very easy to reproduce. I append the log below: Noticed that this issue only happens when HW scatter-gather (NETIF_F_SG) is on. [AST /]$ iperf3 -c 192.168.100.89 Connecting to host 192.168.100.89, port 5201 [ 4] local 192.168.100.45 port 45346 connected to 192.168.100.89 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 44.8 MBytes 375 Mbits/sec 2 1.43 KBytes [ 4] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec 2 1.43 KBytes [ 4] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec 0 1.43 KBytes [ 4] 3.00-4.00 sec 0.00 Bytes 0.00 bits/sec 1 1.43 KBytes [ 4] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec 0 1.43 KBytes ^C[ 4] 5.00-5.88 sec 0.00 Bytes 0.00 bits/sec 0 1.43 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-5.88 sec 44.8 MBytes 64.0 Mbits/sec 5 sender [ 4] 0.00-5.88 sec 0.00 Bytes 0.00 bits/sec receiver iperf3: interrupt - the client has terminated > We could only reproduce it when performing other functions, such as > debugging/booting the host processor. > Could it be another issue? > > > > +/* > > > > + * test mode control register > > > > + */ > > > > +#define FTGMAC100_TM_RQ_TX_VALID_DIS (1 << 28) #define > > > > +FTGMAC100_TM_RQ_RR_IDLE_PREV (1 << 27) #define > > > > +FTGMAC100_TM_DEFAULT > > > \ > > > > + (FTGMAC100_TM_RQ_TX_VALID_DIS | > > > FTGMAC100_TM_RQ_RR_IDLE_PREV) > > > > > > Will aspeed issue an updated datasheet with this register documented? > > Did you see this question? > Sorry, I missed this question. Aspeed will update the datasheet accordingly. > Cheers, > > Joel