Hi Heiner, > -----Original Message----- > From: Heiner Kallweit <hkallwe...@gmail.com> > Sent: 2021年4月7日 15:12 > To: Joakim Zhang <qiangqing.zh...@nxp.com>; christian.me...@t2data.com; > and...@lunn.ch; li...@armlinux.org.uk; da...@davemloft.net; > k...@kernel.org > Cc: netdev@vger.kernel.org; linux-ker...@vger.kernel.org; dl-linux-imx > <linux-...@nxp.com>; Florian Fainelli <f.faine...@gmail.com> > Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after MDIO bus resume > back > > On 07.04.2021 03:43, Joakim Zhang wrote: > > > > Hi Heiner, > > > >> -----Original Message----- > >> From: Heiner Kallweit <hkallwe...@gmail.com> > >> Sent: 2021年4月7日 2:22 > >> To: Joakim Zhang <qiangqing.zh...@nxp.com>; > >> christian.me...@t2data.com; and...@lunn.ch; li...@armlinux.org.uk; > >> da...@davemloft.net; k...@kernel.org; Russell King - ARM Linux > >> <li...@armlinux.org.uk> > >> Cc: netdev@vger.kernel.org; linux-ker...@vger.kernel.org; > >> dl-linux-imx <linux-...@nxp.com>; Florian Fainelli > >> <f.faine...@gmail.com> > >> Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after MDIO bus > >> resume back > >> > >> On 06.04.2021 13:42, Heiner Kallweit wrote: > >>> On 06.04.2021 12:07, Joakim Zhang wrote: > >>>> > >>>>> -----Original Message----- > >>>>> From: Heiner Kallweit <hkallwe...@gmail.com> > >>>>> Sent: 2021年4月6日 14:29 > >>>>> To: Joakim Zhang <qiangqing.zh...@nxp.com>; > >>>>> christian.me...@t2data.com; and...@lunn.ch; li...@armlinux.org.uk; > >>>>> da...@davemloft.net; k...@kernel.org > >>>>> Cc: netdev@vger.kernel.org; linux-ker...@vger.kernel.org; > >>>>> dl-linux-imx <linux-...@nxp.com> > >>>>> Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after MDIO > >>>>> bus resume back > >>>>> > >>>>> On 06.04.2021 04:07, Joakim Zhang wrote: > >>>>>> > >>>>>> Hi Heiner, > >>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: Heiner Kallweit <hkallwe...@gmail.com> > >>>>>>> Sent: 2021年4月5日 20:10 > >>>>>>> To: christian.me...@t2data.com; Joakim Zhang > >>>>>>> <qiangqing.zh...@nxp.com>; and...@lunn.ch; > >>>>>>> li...@armlinux.org.uk; da...@davemloft.net; k...@kernel.org > >>>>>>> Cc: netdev@vger.kernel.org; linux-ker...@vger.kernel.org; > >>>>>>> dl-linux-imx <linux-...@nxp.com> > >>>>>>> Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after > >>>>>>> MDIO bus resume back > >>>>>>> > >>>>>>> On 05.04.2021 10:43, Christian Melki wrote: > >>>>>>>> On 4/5/21 12:48 AM, Heiner Kallweit wrote: > >>>>>>>>> On 04.04.2021 16:09, Heiner Kallweit wrote: > >>>>>>>>>> On 04.04.2021 12:07, Joakim Zhang wrote: > >>>>>>>>>>> commit 4c0d2e96ba055 ("net: phy: consider that suspend2ram > >> may > >>>>>>>>>>> cut off PHY power") invokes phy_init_hw() when MDIO bus > >>>>>>>>>>> resume, it will soft reset PHY if PHY driver implements > >>>>>>>>>>> soft_reset > >> callback. > >>>>>>>>>>> commit 764d31cacfe4 ("net: phy: micrel: set soft_reset > >>>>>>>>>>> callback to genphy_soft_reset for KSZ8081") adds soft_reset > >>>>>>>>>>> for > >> KSZ8081. > >>>>>>>>>>> After these two patches, I found i.MX6UL 14x14 EVK which > >>>>>>>>>>> connected to KSZ8081RNB doesn't work any more when system > >>>>> resume > >>>>>>>>>>> back, MAC > >>>>>>> driver is fec_main.c. > >>>>>>>>>>> > >>>>>>>>>>> It's obvious that initializing PHY hardware when MDIO bus > >>>>>>>>>>> resume back would introduce some regression when PHY > >>>>>>>>>>> implements soft_reset. When I > >>>>>>>>>> > >>>>>>>>>> Why is this obvious? Please elaborate on why a soft reset > >>>>>>>>>> should break something. > >>>>>>>>>> > >>>>>>>>>>> am debugging, I found PHY works fine if MAC doesn't support > >>>>>>>>>>> suspend/resume or phy_stop()/phy_start() doesn't been called > >>>>>>>>>>> during suspend/resume. This let me realize, PHY state > >>>>>>>>>>> machine > >>>>>>>>>>> phy_state_machine() could do something breaks the PHY. > >>>>>>>>>>> > >>>>>>>>>>> As we known, MAC resume first and then MDIO bus resume > when > >>>>>>>>>>> system resume back from suspend. When MAC resume, usually > it > >>>>>>>>>>> will invoke > >>>>>>>>>>> phy_start() where to change PHY state to PHY_UP, then > >>>>>>>>>>> trigger the > >>>>>>>>>>> stat> machine to run now. In phy_state_machine(), it will > >>>>>>>>>>> start/config auto-nego, then change PHY state to PHY_NOLINK, > >>>>>>>>>>> what to next is periodically check PHY link status. When > >>>>>>>>>>> MDIO bus resume, it will initialize PHY hardware, including > >>>>>>>>>>> soft_reset, what would soft_reset affect seems various from > >> different PHYs. > >>>>>>>>>>> For KSZ8081RNB, when it in PHY_NOLINK state and then perform > >>>>>>>>>>> a soft reset, > >>>>>>> it will never complete auto-nego. > >>>>>>>>>> > >>>>>>>>>> Why? That would need to be checked in detail. Maybe chip > >>>>>>>>>> errata documentation provides a hint. > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> The KSZ8081 spec says the following about bit BMCR_PDOWN: > >>>>>>>>> > >>>>>>>>> If software reset (Register 0.15) is used to exit power-down > >>>>>>>>> mode (Register 0.11 = 1), two software reset writes (Register > >>>>>>>>> 0.15 = 1) are required. The first write clears power-down > >>>>>>>>> mode; the second write resets the chip and re-latches the pin > >>>>>>>>> strapping pin > >> values. > >>>>>>>>> > >>>>>>>>> Maybe this causes the issue you see and genphy_soft_reset() > >>>>>>>>> isn't appropriate for this PHY. Please re-test with the > >>>>>>>>> KSZ8081 soft reset following the spec comment. > >>>>>>>>> > >>>>>>>> > >>>>>>>> Interesting. Never expected that behavior. > >>>>>>>> Thanks for catching it. Skimmed through the datasheets/erratas. > >>>>>>>> This is what I found (micrel.c): > >>>>>>>> > >>>>>>>> 10/100: > >>>>>>>> 8001 - Unaffected? > >>>>>>>> 8021/8031 - Double reset after PDOWN. > >>>>>>>> 8041 - Errata. PDOWN broken. Recommended do not use. Unclear if > >>>>>>>> reset solves the issue since errata says no error after reset > >>>>>>>> but is also claiming that only toggling PDOWN (may) or power will > help. > >>>>>>>> 8051 - Double reset after PDOWN. > >>>>>>>> 8061 - Double reset after PDOWN. > >>>>>>>> 8081 - Double reset after PDOWN. > >>>>>>>> 8091 - Double reset after PDOWN. > >>>>>>>> > >>>>>>>> 10/100/1000: > >>>>>>>> Nothing in gigabit afaics. > >>>>>>>> > >>>>>>>> Switches: > >>>>>>>> 8862 - Not affected? > >>>>>>>> 8863 - Errata. PDOWN broken. Reset will not help. Workaround > exists. > >>>>>>>> 8864 - Not affected? > >>>>>>>> 8873 - Errata. PDOWN broken. Reset will not help. Workaround > exists. > >>>>>>>> 9477 - Errata. PDOWN broken. Will randomly cause link failure > >>>>>>>> on adjacent links. Do not use. > >>>>>>>> > >>>>>>>> This certainly explains a lot. > >>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> This patch changes PHY state to PHY_UP when MDIO bus > resume > >>>>>>>>>>> back, it should be reasonable after PHY hardware > >>>>>>>>>>> re-initialized. Also give state machine a chance to > >>>>>>>>>>> start/config > >> auto-nego again. > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> If the MAC driver calls phy_stop() on suspend, then > >>>>>>>>>> phydev->suspended is true and mdio_bus_phy_may_suspend() > >>>>>>>>>> phydev->returns > >>>>>>>>>> false. As a consequence > >>>>>>>>>> phydev->suspended_by_mdio_bus is false and > >>>>>>>>>> phydev->mdio_bus_phy_resume() > >>>>>>>>>> skips the PHY hw initialization. > >>>>>>>>>> Please also note that mdio_bus_phy_suspend() calls > >>>>>>>>>> phy_stop_machine() that sets the state to PHY_UP. > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> Forgot that MDIO bus suspend is done before MAC driver suspend. > >>>>>>>>> Therefore disregard this part for now. > >>>>>>>>> > >>>>>>>>>> Having said that the current argumentation isn't convincing. > >>>>>>>>>> I'm not aware of such issues on other systems, therefore it's > >>>>>>>>>> likely that something is system-dependent. > >>>>>>>>>> > >>>>>>>>>> Please check the exact call sequence on your system, maybe it > >>>>>>>>>> provides a hint. > >>>>>>>>>> > >>>>>>>>>>> Signed-off-by: Joakim Zhang <qiangqing.zh...@nxp.com> > >>>>>>>>>>> --- > >>>>>>>>>>> drivers/net/phy/phy_device.c | 7 +++++++ > >>>>>>>>>>> 1 file changed, 7 insertions(+) > >>>>>>>>>>> > >>>>>>>>>>> diff --git a/drivers/net/phy/phy_device.c > >>>>>>>>>>> b/drivers/net/phy/phy_device.c index > >>>>>>>>>>> cc38e326405a..312a6f662481 > >>>>>>>>>>> 100644 > >>>>>>>>>>> --- a/drivers/net/phy/phy_device.c > >>>>>>>>>>> +++ b/drivers/net/phy/phy_device.c > >>>>>>>>>>> @@ -306,6 +306,13 @@ static __maybe_unused int > >>>>>>> mdio_bus_phy_resume(struct device *dev) > >>>>>>>>>>> ret = phy_resume(phydev); > >>>>>>>>>>> if (ret < 0) > >>>>>>>>>>> return ret; > >>>>>>>>>>> + > >>>>>>>>>>> + /* PHY state could be changed to PHY_NOLINK from MAC > >>>>>>>>>>> +controller > >>>>>>> resume > >>>>>>>>>>> + * rounte with phy_start(), here change to PHY_UP after > >>>>>>> re-initializing > >>>>>>>>>>> + * PHY hardware, let PHY state machine to start/config > >>>>>>>>>>> +auto-nego > >>>>>>> again. > >>>>>>>>>>> + */ > >>>>>>>>>>> + phydev->state = PHY_UP; > >>>>>>>>>>> + > >>>>>>>>>>> no_resume: > >>>>>>>>>>> if (phydev->attached_dev && phydev->adjust_link) > >>>>>>>>>>> phy_start_machine(phydev); > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> This is a quick draft of the modified soft reset for KSZ8081. > >>>>>>> Some tests would be appreciated. > >>>>>>> > >>>>>> > >>>>>> I test this patch at my side, unfortunately, it still can't work. > >>>>>> > >>>>>> I have not found what soft reset would affect in 8081 spec, is > >>>>>> there ang common Description for different PHYs? > >>>>>> > >>>>> > >>>>> You can check the clause 22 spec: 802.3 22.2.4.1.1 > >>>>> > >>>>> Apart from that you can check BMCR value and which modes your PHY > >>>>> advertises after a soft reset. If the PHY indeed wouldn't restart > >>>>> aneg after a soft reset, then it would be the only one with this > >>>>> behavior I know. And I'd wonder why there aren't more such reports. > >>>> > >>>> Hi Heiner, > >>>> > >>>> We have reasons to believe it is a weird behavior of KSZ8081. I > >>>> have another two PHYs at my side, AR8031 and RTL8211FD, they can > >>>> work fine if > >> soft_reset implemented. > >>>> > >>>> As commit message described, phy_start() would change PHY state to > >> PHY_UP finally to PHY_NOLINK when MAC resume. > >>>> After MDIO bus resume back, it will periodically check link status. > >>>> I did below > >> code change to dump all PHY registers. > >>>> > >>>> --- a/drivers/net/phy/phy.c > >>>> +++ b/drivers/net/phy/phy.c > >>>> @@ -35,7 +35,7 @@ > >>>> #include <net/genetlink.h> > >>>> #include <net/sock.h> > >>>> > >>>> -#define PHY_STATE_TIME HZ > >>>> +#define PHY_STATE_TIME (10 * HZ) > >>>> > >>>> #define PHY_STATE_STR(_state) \ > >>>> case PHY_##_state: \ > >>>> @@ -738,6 +738,28 @@ static int phy_check_link_status(struct > >>>> phy_device > >> *phydev) > >>>> if (err) > >>>> return err; > >>>> > >>>> + printk("offset 0x00 data = %0x\n", phy_read(phydev, 0x00)); > >>>> + printk("offset 0x01 data = %0x\n", phy_read(phydev, 0x01)); > >>>> + printk("offset 0x02 data = %0x\n", phy_read(phydev, 0x02)); > >>>> + printk("offset 0x03 data = %0x\n", phy_read(phydev, 0x03)); > >>>> + printk("offset 0x04 data = %0x\n", phy_read(phydev, 0x04)); > >>>> + printk("offset 0x05 data = %0x\n", phy_read(phydev, 0x05)); > >>>> + printk("offset 0x06 data = %0x\n", phy_read(phydev, 0x06)); > >>>> + printk("offset 0x07 data = %0x\n", phy_read(phydev, 0x07)); > >>>> + printk("offset 0x08 data = %0x\n", phy_read(phydev, 0x08)); > >>>> + printk("offset 0x09 data = %0x\n", phy_read(phydev, 0x09)); > >>>> + printk("offset 0x10 data = %0x\n", phy_read(phydev, 0x10)); > >>>> + printk("offset 0x11 data = %0x\n", phy_read(phydev, 0x11)); > >>>> + printk("offset 0x15 data = %0x\n", phy_read(phydev, 0x15)); > >>>> + printk("offset 0x16 data = %0x\n", phy_read(phydev, 0x16)); > >>>> + printk("offset 0x17 data = %0x\n", phy_read(phydev, 0x17)); > >>>> + printk("offset 0x18 data = %0x\n", phy_read(phydev, 0x18)); > >>>> + printk("offset 0x1b data = %0x\n", phy_read(phydev, 0x1b)); > >>>> + printk("offset 0x1c data = %0x\n", phy_read(phydev, 0x1c)); > >>>> + printk("offset 0x1d data = %0x\n", phy_read(phydev, 0x1d)); > >>>> + printk("offset 0x1e data = %0x\n", phy_read(phydev, 0x1e)); > >>>> + printk("offset 0x1f data = %0x\n", phy_read(phydev, 0x1f)); > >>>> + printk("\n\n"); > >>>> if (phydev->link && phydev->state != PHY_RUNNING) { > >>>> phy_check_downshift(phydev); > >>>> phydev->state = PHY_RUNNING; > >>>> > >>>> After MDIO bus resume back, results as below: > >>>> > >>>> [ 119.545383] offset 0x00 data = 1100 [ 119.549237] offset 0x01 > >>>> data = 7849 [ 119.563125] offset 0x02 data = 22 [ 119.566810] > >>>> offset 0x03 data = 1560 [ 119.570597] offset 0x04 data = 85e1 [ > >>>> 119.588016] offset 0x05 data = 0 [ 119.592891] offset 0x06 data = > >>>> 4 [ 119.596452] offset 0x07 data = 2001 [ 119.600233] offset 0x08 > >>>> data = 0 [ 119.617864] offset 0x09 data = 0 [ 119.625990] offset > >>>> 0x10 data = 0 [ 119.629576] offset 0x11 data = 0 [ 119.642735] > >>>> offset 0x15 data = 0 [ 119.646332] offset 0x16 data = 202 [ > >>>> 119.650030] offset 0x17 data = 5c02 [ 119.668054] offset 0x18 data > >>>> = > >>>> 801 [ 119.672997] offset 0x1b data = 0 [ 119.676565] offset 0x1c > >>>> data = 0 [ 119.680084] offset 0x1d data = 0 [ 119.698031] offset > >>>> 0x1e data = 20 [ 119.706246] offset 0x1f data = 8190 [ > >>>> 119.709984] [ 119.709984] [ 120.182120] offset 0x00 data = 100 [ > >>>> 120.185894] offset 0x01 data = 784d [ 120.189681] offset 0x02 data > >>>> = 22 [ 120.206319] offset 0x03 data = 1560 [ 120.210171] offset > >>>> 0x04 data = > >>>> 8061 [ 120.225353] offset 0x05 data = 0 [ 120.228948] offset 0x06 > >>>> data = 4 [ 120.242936] offset 0x07 data = 2001 [ 120.246792] > >>>> offset > >>>> 0x08 data = 0 [ 120.250313] offset 0x09 data = 0 [ 120.266748] > >>>> offset 0x10 data = 0 [ 120.270335] offset 0x11 data = 0 [ > >>>> 120.284167] offset 0x15 data = 0 [ 120.287760] offset 0x16 data = > >>>> 202 [ 120.301031] offset 0x17 data = 3c02 [ 120.313209] offset > >>>> 0x18 data = 801 [ 120.316983] offset 0x1b data = 0 [ 120.320513] > >>>> offset 0x1c data = 1 [ 120.336589] offset 0x1d data = 0 [ > >>>> 120.340184] offset 0x1e data = 115 [ 120.355357] offset 0x1f data > >>>> = 8190 [ 120.359112] [ 120.359112] [ 129.785396] offset 0x00 data > >>>> = 1100 [ 129.789252] offset 0x01 data = 7849 [ 129.809951] offset > >>>> 0x02 data = > >>>> 22 [ 129.815018] offset 0x03 data = 1560 [ 129.818845] offset > >>>> 0x04 data = 85e1 [ 129.835808] offset 0x05 data = 0 [ 129.839398] > >>>> offset > >>>> 0x06 data = 4 [ 129.854514] offset 0x07 data = 2001 [ 129.858371] > >>>> offset 0x08 data = 0 [ 129.873357] offset 0x09 data = 0 [ > >>>> 129.876954] offset 0x10 data = 0 [ 129.880472] offset 0x11 data = > >>>> 0 [ 129.896450] offset 0x15 data = 0 [ 129.900038] offset 0x16 > >>>> data = > >>>> 202 [ 129.915041] offset 0x17 data = 5c02 [ 129.918889] offset > >>>> 0x18 data = 801 [ 129.932735] offset 0x1b data = 0 [ 129.946830] > >>>> offset 0x1c data = 0 [ 129.950424] offset 0x1d data = 0 [ > >>>> 129.964585] offset 0x1e data = 0 [ 129.968192] offset 0x1f data = > >>>> 8190 [ 129.972938] [ 129.972938] [ 130.425125] offset 0x00 data = > >>>> 100 [ 130.428889] offset 0x01 data = 784d [ 130.442671] offset > >>>> 0x02 data = > >>>> 22 [ 130.446360] offset 0x03 data = 1560 [ 130.450142] offset > >>>> 0x04 data = 8061 [ 130.467207] offset 0x05 data = 0 [ 130.470789] > >>>> offset > >>>> 0x06 data = 4 [ 130.485071] offset 0x07 data = 2001 [ 130.488934] > >>>> offset 0x08 data = 0 [ 130.504320] offset 0x09 data = 0 [ > >>>> 130.507911] offset 0x10 data = 0 [ 130.520950] offset 0x11 data = > >>>> 0 [ 130.532865] offset 0x15 data = 0 [ 130.536461] offset 0x16 > >>>> data = > >>>> 202 [ 130.540156] offset 0x17 data = 3c02 [ 130.557218] offset > >>>> 0x18 data = 801 [ 130.560987] offset 0x1b data = 0 [ 130.575158] > >>>> offset 0x1c data = 1 [ 130.578754] offset 0x1d data = 0 [ > >>>> 130.593543] offset 0x1e data = 115 [ 130.597312] offset 0x1f data > >>>> = 8190 > >>>> > >>>> We can see that BMCR and BMSR changes from 0x1100,0x7849 to > >> 0x100,0x784d (BMCR[12] bit and BMSR[2]), and loop it. > >>>> Above process is auto-nego enabled, link is down, auto-nego is > >>>> disabled, link > >> is up. And auto-nego complete bit is always 0. > >>>> > >>>> PHY seems become unstable if soft reset after start/config > >>>> auto-nego. I also > >> want to fix it in micrel driver, but failed. > >>>> > >>> > >>> Waiting for ANEG_COMPLETE to be set wouldn't be a good option. Aneg > >>> may never complete for different reasons, e.g. no physical link. And > >>> even if we use a timeout this may add unwanted delays. > >>> > >>>> Do you have any other insights that can help me further locate the issue? > >> Thanks. > >>>> > >>> > >>> I think current MAC/PHY PM handling isn't perfect. Often we have the > >>> following > >>> scenario: > >>> > >>> *suspend* > >>> 1. PHY is suspended (mdio_bus_phy_suspend) 2. MAC suspend callback > >>> (typically involving phy_stop()) > >>> > >>> *resume* > >>> 1. MAC resume callback (typically involving phy_start()) 2. PHY is > >>> resumed (mdio_bus_phy_resume), incl. calling phy_init_hw() > >>> > >>> Calling phy_init_hw() after phy_start() doesn't look right. > >>> It seems to work in most cases, but there's a certain risk that > >>> phy_init_hw() overwrites something, e.g. the advertised modes. > >>> I think we have two valid scenarios: > >>> > >>> 1. phylib PM callbacks are used, then the MAC driver shouldn't > >>> touch the PHY in its PM callbacks, especially not call > >>> phy_stop/phy_start. > >>> > >>> 2. MAC PM callbacks take care also of the PHY. Then I think we would > >>> need a flag at the phy_device telling it to make the PHY PM > >>> callbacks a no-op. > >>> > >>> Andrew, Russell: What do you think? > >>> > >>>> Best Regards, > >>>> Joakim Zhang > >>>> > >>>> [...] > >>>> > >>> Heiner > >>> > >> > >> Based on scenario 1 you can also try the following. > >> > >> diff --git a/drivers/net/ethernet/freescale/fec_main.c > >> b/drivers/net/ethernet/freescale/fec_main.c > >> index 3db882322..cdf9160fc 100644 > >> --- a/drivers/net/ethernet/freescale/fec_main.c > >> +++ b/drivers/net/ethernet/freescale/fec_main.c > >> @@ -3805,7 +3805,6 @@ static int __maybe_unused fec_suspend(struct > >> device *dev) > >> if (netif_running(ndev)) { > >> if (fep->wol_flag & FEC_WOL_FLAG_ENABLE) > >> fep->wol_flag |= FEC_WOL_FLAG_SLEEP_ON; > >> - phy_stop(ndev->phydev); > >> napi_disable(&fep->napi); > >> netif_tx_lock_bh(ndev); > >> netif_device_detach(ndev); > >> @@ -3864,7 +3863,6 @@ static int __maybe_unused fec_resume(struct > >> device *dev) > >> netif_device_attach(ndev); > >> netif_tx_unlock_bh(ndev); > >> napi_enable(&fep->napi); > >> - phy_start(ndev->phydev); > >> } > >> rtnl_unlock(); > > > > As I described in commit message: > > > > "When I am debugging, I found PHY works fine if MAC doesn't support > suspend/resume or phy_stop()/phy_start() doesn't been called during > suspend/resume. This let me realize, PHY state machine phy_state_machine() > could do something breaks the PHY." > > > > So I have tried your above code change before, and it works. But it is a > > very > common phenomenon that MAC suspend/resume invokes phy_stop/start or > phylink_stop/start, and it's been around for a long time. > > > > As the scenarios you list, it is indeed unreasonable to soft reset or > > config PHY > after calling phy_start_aneg() in state machine. PHY state should be PHY_UP > after calling phy_init_hw(), It seems more reasonable. > > > > Best Regards, > > Joakim Zhang > >> -- > >> 2.31.1 > >> > > > > Following is a draft patch for scenario 1: > Make PHY PM a no-op if MAC manages PHY PM. > Can you give it a try? >
It can work for my case. Thanks. Best Regards, Joakim Zhang > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index > a009d1769..73d29fd5e 100644 > --- a/drivers/net/phy/phy_device.c > +++ b/drivers/net/phy/phy_device.c > @@ -273,6 +273,9 @@ static __maybe_unused int > mdio_bus_phy_suspend(struct device *dev) { > struct phy_device *phydev = to_phy_device(dev); > > + if (phydev->mac_managed_pm) > + return 0; > + > /* We must stop the state machine manually, otherwise it stops out of > * control, possibly with the phydev->lock held. Upon resume, netdev > * may call phy routines that try to grab the same lock, and that may @@ > -294,6 +297,9 @@ static __maybe_unused int mdio_bus_phy_resume(struct > device *dev) > struct phy_device *phydev = to_phy_device(dev); > int ret; > > + if (phydev->mac_managed_pm) > + return 0; > + > if (!phydev->suspended_by_mdio_bus) > goto no_resume; > > diff --git a/include/linux/phy.h b/include/linux/phy.h index > 8e2cf84b2..46702af18 100644 > --- a/include/linux/phy.h > +++ b/include/linux/phy.h > @@ -567,6 +567,7 @@ struct phy_device { > unsigned loopback_enabled:1; > unsigned downshifted_rate:1; > unsigned is_on_sfp_module:1; > + unsigned mac_managed_pm:1; > > unsigned autoneg:1; > /* The most recently read link state */ > -- > 2.31.1 > > > > diff --git a/drivers/net/ethernet/freescale/fec_main.c > b/drivers/net/ethernet/freescale/fec_main.c > index 3db882322..70aea9c27 100644 > --- a/drivers/net/ethernet/freescale/fec_main.c > +++ b/drivers/net/ethernet/freescale/fec_main.c > @@ -2048,6 +2048,8 @@ static int fec_enet_mii_probe(struct net_device > *ndev) > fep->link = 0; > fep->full_duplex = 0; > > + phy_dev->mac_managed_pm = 1; > + > phy_attached_info(phy_dev); > > return 0; > @@ -3864,6 +3866,7 @@ static int __maybe_unused fec_resume(struct > device *dev) > netif_device_attach(ndev); > netif_tx_unlock_bh(ndev); > napi_enable(&fep->napi); > + phy_init_hw(ndev->phydev); > phy_start(ndev->phydev); > } > rtnl_unlock(); > -- > 2.31.1