On 01/04/2017 04:13 PM, Florian Fainelli wrote: > > > On 01/04/2017 07:04 AM, Zefir Kurtisi wrote: >> While in RUNNING state, phy_state_machine() checks for link changes by >> comparing phydev->link before and after calling phy_read_status(). >> This works as long as it is guaranteed that phydev->link is never >> changed outside the phy_state_machine(). >> >> If in some setups this happens, it causes the state machine to miss >> a link loss and remain RUNNING despite phydev->link being 0. >> >> This has been observed running a dsa setup with a process continuously >> polling the link states over ethtool each second (SNMPD RFC-1213 >> agent). Disconnecting the link on a phy followed by a ETHTOOL_GSET >> causes dsa_slave_get_settings() / dsa_slave_get_link_ksettings() to >> call phy_read_status() and with that modify the link status - and >> with that bricking the phy state machine. > > That's the interesting part of the analysis, how does this brick the PHY > state machine? Is the PHY driver changing the link status in the > read_status callback that it implements? > phydev->read_status points to genphy_read_status(), where the first call goes to genphy_update_link() which updates the link status.
Thereafter phy_state_machine():RUNNING won't be able to detect the link loss anymore unless the link state changes again. I was trying to figure out if there is a rule that forbids changing phydev->link from outside the state machine, but found several places where it happens (either directly, or over genphy_read_status() or over genphy_update_link()). Curious how this did not show up before, since within the dsa setup it is very easy to trigger: a) physically disconnect link b) within one second run ethtool ethX Cheers, Zefir