On 2020/5/13 9:59, Andrew Lunn wrote: > On Wed, May 13, 2020 at 09:34:13AM +0800, Yonglong Liu wrote: >> Hi, Andrew: >> Thanks for your reply! >> >> On 2020/5/12 22:00, Andrew Lunn wrote: >>> On Tue, May 12, 2020 at 08:48:21PM +0800, Yonglong Liu wrote: >>>> I use two devices, both support 1000M speed, they are directly connected >>>> with a network cable. Two devices enable autoneg, and then do the following >>>> test repeatedly: >>>> ifconfig eth5 down >>>> ifconfig eth5 up >>>> sleep $((RANDOM%6)) >>>> ifconfig eth5 down >>>> ifconfig eth5 up >>>> sleep 10 >>>> >>>> With low probability, one device A link up with 100Mb/s, the other B link >>>> up with >>>> 1000Mb/s(the actual link speed read from phy is 100Mb/s), and the network >>>> can >>>> not work. >>>> >>>> device A: >>>> Settings for eth5: >>>> Supported ports: [ TP ] >>>> Supported link modes: 10baseT/Half 10baseT/Full >>>> 100baseT/Half 100baseT/Full >>>> 1000baseT/Full >>>> Supported pause frame use: Symmetric Receive-only >>>> Supports auto-negotiation: Yes >>>> Supported FEC modes: Not reported >>>> Advertised link modes: 10baseT/Half 10baseT/Full >>>> 100baseT/Half 100baseT/Full >>>> 1000baseT/Full >>>> Advertised pause frame use: Symmetric >>>> Advertised auto-negotiation: Yes >>>> Advertised FEC modes: Not reported >>>> Link partner advertised link modes: 10baseT/Half 10baseT/Full >>>> 100baseT/Half 100baseT/Full >>>> Link partner advertised pause frame use: Symmetric >>>> Link partner advertised auto-negotiation: Yes >>>> Link partner advertised FEC modes: Not reported >>>> Speed: 100Mb/s >>>> Duplex: Full >>>> Port: MII >>>> PHYAD: 3 >>>> Transceiver: internal >>>> Auto-negotiation: on >>>> Current message level: 0x00000036 (54) >>>> probe link ifdown ifup >>>> Link detected: yes >>>> >>>> The regs value read from mdio are: >>>> reg 9 = 0x200 >>>> reg a = 0 >>>> >>>> device B: >>>> Settings for eth5: >>>> Supported ports: [ TP ] >>>> Supported link modes: 10baseT/Half 10baseT/Full >>>> 100baseT/Half 100baseT/Full >>>> 1000baseT/Full >>>> Supported pause frame use: Symmetric Receive-only >>>> Supports auto-negotiation: Yes >>>> Supported FEC modes: Not reported >>>> Advertised link modes: 10baseT/Half 10baseT/Full >>>> 100baseT/Half 100baseT/Full >>>> 1000baseT/Full >>>> Advertised pause frame use: Symmetric >>>> Advertised auto-negotiation: Yes >>>> Advertised FEC modes: Not reported >>>> Link partner advertised link modes: 10baseT/Half 10baseT/Full >>>> 100baseT/Half 100baseT/Full >>>> 1000baseT/Full >>>> Link partner advertised pause frame use: Symmetric >>>> Link partner advertised auto-negotiation: Yes >>>> Link partner advertised FEC modes: Not reported >>>> Speed: 1000Mb/s >>>> Duplex: Full >>>> Port: MII >>>> PHYAD: 3 >>>> Transceiver: internal >>>> Auto-negotiation: on >>>> Current message level: 0x00000036 (54) >>>> probe link ifdown ifup >>>> Link detected: yes >>>> >>>> The regs value read from mdio are: >>>> reg 9 = 0 >>>> reg a = 0x800 >>>> >>>> I had talk to the FAE of rtl8211f, they said if negotiation failed with >>>> 1000Mb/s, >>>> rtl8211f will change reg 9 to 0, than try to negotiation with 100Mb/s. >>>> >>>> The problem happened as: >>>> ifconfig eth5 up -> phy_start -> phy_start_aneg -> >>>> phy_modify_changed(MII_CTRL1000) >>>> (this time both A and B, reg 9 = 0x200) -> wait for link up -> (B: reg 9 >>>> changed to 0) >>>> -> link up. >>> >>> This sounds like downshift, but not correctly working. 1Gbps requires >>> that 4 pairs in the cable work. If a 1Gbps link is negotiated, but >>> then does not establish because one of the pairs is broken, some PHYs >>> will try to 'downshift'. They drop down to 100Mbps, which only >>> requires two pairs of the cable to work. To do this, the PHY should >>> change what it is advertising, to no longer advertise 1G, just 100M >>> and 10M. The link partner should then try to use 100Mbps and >>> hopefully, a link is established. >>> >>> Looking at the ethtool, you can see device A is reporting device B is >>> only advertising upto 100Mbps. Yet it is locally using 1G. That is >>> broken. So i would say device A has the problem. Are both PHYs >>> rtl8211f? >> >> Both PHY is rtl8211f. I think Device B is broken. Device B advertising >> it supported 1G, but actually, in phy, downshift to 100M, so Device B >> link up with 1G in driver side, but actually 100M in phy. > > You have to be careful with the output of ethtool. Downshift is not > part of 802.3. There i no standard register to indicate it has > happened. Sometimes there is a vendor register. You should check the > datasheet, and look at what other PHY drivers do for this, and > phy_check_downshift(). > >>> Are you 100% sure your cable and board layout is good? Is it >>> trying downshift because something is broken? Fix the >>> cable/connector and the > >> Will check the layout with hardware engineer. This happened with a low >> probability. When this happened, another down/up operation or restart >> autoneg will solved. > >>> reason to downshift goes away. But it does not solve the problem if a >>> customer has a broken cable. So you might want to deliberately cut a >>> pair in the cable so it becomes 100% reproducable and try to debug it >>> further. See if you can find out why auto-neg is not working >>> correctly. >> >> So, your opinion is, maybe we should checkout whether the hardware layout >> or cable have problem? > > Well, there are a couple of issues here. > > It could be a hardware problem. Best case, it is the cable. But if you > can reproduce it with other boards, it is a board design issue, which > you might want to get fixed. If it happens for you in the lab, it will > probably happen out in the field. > > You should also consider what you want to happen with a cable that > really is broken. It would be nice if downshift worked. Slower > networking is better than no networking. Unless you have a requirement > that 100Mbps is too slow for your use case. So you might want to debug > what is going wrong when downshift happens. > >> By the way, do we have some mechanism to solve this downshift in software >> side? If the PHY advertising downshift to 100M, but software still have >> advertising with 1G(just like Device B), it will always have a broken >> network. > > You might get some ideas from phy_check_downshift(). A lot will > depended on what information you can get from the PHY. > > Andrew >
Hi, Andrew: Thanks very much! That's so helpfull! > . >