Hi Andrew, > On Sep 23, 2020, at 23:37, Andrew Lunn <and...@lunn.ch> wrote: > > On Wed, Sep 23, 2020 at 10:44:10PM +0800, Kai-Heng Feng wrote: >> Hi Andrew, >> >>> On Sep 23, 2020, at 20:17, Andrew Lunn <and...@lunn.ch> wrote: >>> >>> On Wed, Sep 23, 2020 at 03:47:51PM +0800, Kai-Heng Feng wrote: >>>> We are seeing the following error after S3 resume: >>>> [ 704.746874] e1000e 0000:00:1f.6 eno1: Setting page 0x6020 >>>> [ 704.844232] e1000e 0000:00:1f.6 eno1: MDI Write did not complete >>>> [ 704.902817] e1000e 0000:00:1f.6 eno1: Setting page 0x6020 >>>> [ 704.903075] e1000e 0000:00:1f.6 eno1: reading PHY page 769 (or 0x6020 >>>> shifted) reg 0x17 >>>> [ 704.903281] e1000e 0000:00:1f.6 eno1: Setting page 0x6020 >>>> [ 704.903486] e1000e 0000:00:1f.6 eno1: writing PHY page 769 (or 0x6020 >>>> shifted) reg 0x17 >>>> [ 704.943155] e1000e 0000:00:1f.6 eno1: MDI Error >>>> ... >>>> [ 705.108161] e1000e 0000:00:1f.6 eno1: Hardware Error >>>> >>>> Since we don't know what platform firmware may do to the phy, so let's >>>> power cycle the phy upon system resume to resolve the issue. >>>> >>>> Signed-off-by: Kai-Heng Feng <kai.heng.f...@canonical.com> >>>> --- >>>> drivers/net/ethernet/intel/e1000e/netdev.c | 2 ++ >>>> 1 file changed, 2 insertions(+) >>>> >>>> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c >>>> b/drivers/net/ethernet/intel/e1000e/netdev.c >>>> index 664e8ccc88d2..c2a87a408102 100644 >>>> --- a/drivers/net/ethernet/intel/e1000e/netdev.c >>>> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c >>>> @@ -6968,6 +6968,8 @@ static __maybe_unused int e1000e_pm_resume(struct >>>> device *dev) >>>> !e1000e_check_me(hw->adapter->pdev->device)) >>>> e1000e_s0ix_exit_flow(adapter); >>>> >>>> + e1000_power_down_phy(adapter); >>>> + >>> >>> static void e1000_power_down_phy(struct e1000_adapter *adapter) >>> { >>> struct e1000_hw *hw = &adapter->hw; >>> >>> /* Power down the PHY so no link is implied when interface is down * >>> * The PHY cannot be powered down if any of the following is true * >>> * (a) WoL is enabled >>> * (b) AMT is active >>> * (c) SoL/IDER session is active >>> */ >>> if (!adapter->wol && hw->mac_type >= e1000_82540 && >>> hw->media_type == e1000_media_type_copper) { >> >> Looks like the the function comes from e1000, >> drivers/net/ethernet/intel/e1000/e1000_main.c. >> However, this patch is for e1000e, so the function with same name is >> different. > > Ah! Sorry. Missed that. Also it is not nice there are two functions in > the kernel with the same name. > >>> Could it be coming out of S3 because it just received a WoL? >> >> No, the issue can be reproduced by pressing keyboard or rtcwake. > > Not relevant now, since i was looking at the wrong function. But i was > meaning the call is a NOP in the case WoL caused the wake up. So if > the issues can also happen after WoL, your fix is not going to fix it. > >>> It seems unlikely that it is the MII_CR_POWER_DOWN which is helping, >>> since that is an MDIO write itself. Do you actually know how this call >>> to e1000_power_down_phy() fixes the issues? >> > >> I don't know from hardware's perspective, but I think the comment on >> e1000_power_down_phy_copper() can give us some insight: > > And there is only one function called e1000_power_down_phy_copper() > :-) > >> >> /** >> * e1000_power_down_phy_copper - Restore copper link in case of PHY power down >> * @hw: pointer to the HW structure >> * >> * In the case of a PHY power down to save power, or to turn off link during a >> * driver unload, or wake on lan is not enabled, restore the link to previous >> * settings. >> **/ >> void e1000_power_down_phy_copper(struct e1000_hw *hw) >> { >> u16 mii_reg = 0; >> >> /* The PHY will retain its settings across a power down/up cycle */ >> e1e_rphy(hw, MII_BMCR, &mii_reg); >> mii_reg |= BMCR_PDOWN; >> e1e_wphy(hw, MII_BMCR, mii_reg); >> usleep_range(1000, 2000); >> } > > I don't really see how this explains this: > >>>> [ 704.746874] e1000e 0000:00:1f.6 eno1: Setting page 0x6020 >>>> [ 704.844232] e1000e 0000:00:1f.6 eno1: MDI Write did not complete > > https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/intel/e1000e/phy.c#L181 > > So first off, the comments are all cut/paste from > e1000e_read_phy_reg_mdic(). It would be nice to s/read/write/g in that > function.
Ah yes... > > So it sets up the transaction and starts it. MDIO is a serial bus with > no acknowledgements. You clock out around 64 bits, and hope the PHY > receives it. The time it takes to send those 64 bits is fixed by the > bus speed, typically 2.5MHz. Thanks for the info. > > So the driver polls waiting for the hardware to say the bits have been > sent. And this is timing out. How long that takes has nothing to do > with the PHY, or what state it is in. Powering down the PHY has no > effect on the MDIO bus master, and how long it takes to shift those > bits out. Which is why i don't think this patch is correct. This is > probably an MDIO bus issue, not a PHY issue. Thanks for pointing out the possible root cause. Indeed this looks like an MDIO issue so this patch is completely wrong. I'll send a V2, thanks. Kai-Heng > > Try dumping the value of MDIC in the good/bad case before the > transaction starts. > > Andrew