Re: [PATCH V2 net-next 07/11] net: hns3: adds debug messages to identify eth down cause
On 2019/7/27 6:18, Joe Perches wrote: > On Fri, 2019-07-26 at 22:00 +, Saeed Mahameed wrote: >> On Fri, 2019-07-26 at 11:24 +0800, Huazhong Tan wrote: >>> From: Yonglong Liu >>> >>> Some times just see the eth interface have been down/up via >>> dmesg, but can not know why the eth down. So adds some debug >>> messages to identify the cause for this. > [] >>> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c >> [] >>> @@ -459,6 +459,10 @@ static int hns3_nic_net_open(struct net_device >>> *netdev) >>> h->ae_algo->ops->set_timer_task(priv->ae_handle, true); >>> >>> hns3_config_xps(priv); >>> + >>> + if (netif_msg_drv(h)) >>> + netdev_info(netdev, "net open\n"); >>> + >> >> to make sure this is only intended for debug, and to avoid repetition. >> #define hns3_dbg(__dev, format, args...) \ >> ({ \ >> if (netif_msg_drv(h)) \ >> netdev_info(h->netdev, format, ##args); \ >> }) > > netif_dbg(h, drv, h->netdev, "net open\n") > Hi, Saeed && Joe: For our cases, maybe netif_info() can be use for HNS3 drivers? netif_dbg need to open dynamic debug options additional.
Re: [RFC] net: phy: read link status twice when phy_check_link_status()
On 2019/7/27 2:14, Heiner Kallweit wrote: > On 26.07.2019 11:53, Yonglong Liu wrote: >> According to the datasheet of Marvell phy and Realtek phy, the >> copper link status should read twice, or it may get a fake link >> up status, and cause up->down->up at the first time when link up. >> This happens more oftem at Realtek phy. >> > This is not correct, there is no fake link up status. > Read the comment in genphy_update_link, only link-down events > are latched. Means if the first read returns link up, then there > is no need for a second read. And in polling mode we don't do a > second read because we want to detect also short link drops. > > It would be helpful if you could describe your actual problem > and whether you use polling or interrupt mode. > [ 44.498633] hns3 :bd:00.1 eth5: net open [ 44.504273] hns3 :bd:00.1: reg=0x1, data=0x79ad -> called from phy_start_aneg [ 44.532348] hns3 :bd:00.1: reg=0x1, data=0x798d -> called from phy_state_machine,update link. According to the datasheet: reg 1.5=0 now, means copper auto-negotiation not complete reg 1.2=1 now, means link is up We can see that, when we read the link up, the auto-negotiation is not complete yet, so the speed is invalid. I don't know why this happen, maybe this state is keep from bios? Or we may do something else in the phy initialize to fix it? And also confuse that why read twice can fix it? [ 44.554063] hns3 :bd:00.1: invalid speed (-1) [ 44.560412] hns3 :bd:00.1 eth5: failed to adjust link. [ 45.194870] hns3 :bd:00.1 eth5: link up [ 45.574095] hns3 :bd:00.1: phyid=3, reg=0x1, data=0x7989 [ 46.150051] hns3 :bd:00.1 eth5: link down [ 46.598074] hns3 :bd:00.1: phyid=3, reg=0x1, data=0x7989 [ 47.622075] hns3 :bd:00.1: phyid=3, reg=0x1, data=0x79a9 [ 48.646077] hns3 :bd:00.1: phyid=3, reg=0x1, data=0x79ad [ 48.934050] hns3 :bd:00.1 eth5: link up [ 49.702140] hns3 :bd:00.1: phyid=3, reg=0x1, data=0x79ad >> I add a fake status read, and can solve this problem. >> >> I also see that in genphy_update_link(), had delete the fake >> read in polling mode, so I don't know whether my solution is >> correct. >> >> Or provide a phydev->drv->read_status functions for the phy I >> used is more acceptable? >> >> Signed-off-by: Yonglong Liu >> --- >> drivers/net/phy/phy.c | 8 >> 1 file changed, 8 insertions(+) >> >> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c >> index ef7aa73..0c03edc 100644 >> --- a/drivers/net/phy/phy.c >> +++ b/drivers/net/phy/phy.c >> @@ -1,4 +1,7 @@ >> // SPDX-License-Identifier: GPL-2.0+ >> +err = phy_read_status(phydev); >> +if (err) >> +return err; > > This seems to be completely wrong at that place. > Sorry, this can be ignore. >> /* Framework for configuring and reading PHY devices >> * Based on code in sungem_phy.c and gianfar_phy.c >> * >> @@ -525,6 +528,11 @@ static int phy_check_link_status(struct phy_device >> *phydev) >> >> WARN_ON(!mutex_is_locked(&phydev->lock)); >> >> +/* Do a fake read */ >> +err = phy_read(phydev, MII_BMSR); >> +if (err < 0) >> +return err; >> + >> err = phy_read_status(phydev); >> if (err) >> return err; >> > > > . >
Re: [PATCH net] net: hns: fix LED configuration for marvell phy
> Revert "net: hns: fix LED configuration for marvell phy" > This reverts commit f4e5f775db5a4631300dccd0de5eafb50a77c131. > > Andrew Lunn says this should be handled another way. > > Signed-off-by: David S. Miller Hi Andrew: I see this patch have been reverted, can you tell me the better way to do this? Thanks very much! On 2019/7/23 9:19, David Miller wrote: > From: Yonglong Liu > Date: Mon, 22 Jul 2019 13:59:12 +0800 > >> Since commit(net: phy: marvell: change default m88e1510 LED configuration), >> the active LED of Hip07 devices is always off, because Hip07 just >> use 2 LEDs. >> This patch adds a phy_register_fixup_for_uid() for m88e1510 to >> correct the LED configuration. >> >> Fixes: 02468ec1 ("net: phy: marvell: change default m88e1510 LED >> configuration") >> Signed-off-by: Yonglong Liu >> Reviewed-by: linyunsheng > > Applied and queued up for -stable. > > . >
Re: [PATCH net] net: hns: fix LED configuration for marvell phy
On 2019/7/25 12:28, Andrew Lunn wrote: > On Thu, Jul 25, 2019 at 11:00:08AM +0800, liuyonglong wrote: >>> Revert "net: hns: fix LED configuration for marvell phy" >>> This reverts commit f4e5f775db5a4631300dccd0de5eafb50a77c131. >>> >>> Andrew Lunn says this should be handled another way. >>> >>> Signed-off-by: David S. Miller >> >> >> Hi Andrew: >> >> I see this patch have been reverted, can you tell me the better way to do >> this? >> Thanks very much! > > Please take a look at the work Matthias Kaehlcke is doing. It has not > got too far yet, but when it is complete, it should define a generic > way to configure PHY LEDs. > > Andrew > Hi Andrew https://lore.kernel.org/patchwork/patch/1097185/ You are discussing about the DT configuration, is Matthias Kaehlcke's work also provide a generic way to configure PHY LEDS using ACPI?
Re: [PATCH net-next 07/11] net: hns3: adds debug messages to identify eth down cause
On 2019/7/25 3:12, Saeed Mahameed wrote: > On Wed, 2019-07-24 at 11:18 +0800, Huazhong Tan wrote: >> From: Yonglong Liu >> >> Some times just see the eth interface have been down/up via >> dmesg, but can not know why the eth down. So adds some debug >> messages to identify the cause for this. >> > > I really don't like this. your default msg lvl has NETIF_MSG_IFDOWN > turned on .. dumping every single operation that happens on your device > by default to kernel log is too much ! > > We should really consider using trace buffers with well defined > structures for vendor specific events. so we can use bpf filters and > state of the art tools for netdev debugging. > We do this because we can just see a link down message in dmesg, and had take a long time to found the cause of link down, just because another user changed the settings. We can change the net_open/net_stop/dcbnl_ops to msg_drv (not default turned on), and want to keep the others default print to kernel log, is it acceptable? >> @@ -1593,6 +1603,11 @@ static int hns3_ndo_set_vf_vlan(struct >> net_device *netdev, int vf, u16 vlan, >> struct hnae3_handle *h = hns3_get_handle(netdev); >> int ret = -EIO; >> >> +if (netif_msg_ifdown(h)) > > why msg_ifdown ? looks like netif_msg_drv is more appropriate, for many > of the cases in this patch. > This operation may cause link down, so we use msg_ifdown. >> +netdev_info(netdev, >> +"set vf vlan: vf=%d, vlan=%d, qos=%d, >> vlan_proto=%d\n", >> +vf, vlan, qos, vlan_proto); >> + >> if (h->ae_algo->ops->set_vf_vlan_filter) >> ret = h->ae_algo->ops->set_vf_vlan_filter(h, vf, vlan, >>qos, >> vlan_proto); >> @@ -1611,6 +1626,10 @@ static int hns3_nic_change_mtu(struct >> net_device *netdev, int new_mtu) >> if (!h->ae_algo->ops->set_mtu) >> return -EOPNOTSUPP; >> >> +if (netif_msg_ifdown(h)) >> +netdev_info(netdev, "change mtu from %d to %d\n", >> +netdev->mtu, new_mtu); >> + >> ret = h->ae_algo->ops->set_mtu(h, new_mtu); >> if (ret) >> netdev_err(netdev, "failed to change MTU in hardware >> %d\n", >> @@ -4395,6 +4414,11 @@ int hns3_set_channels(struct net_device >> *netdev, >> if (kinfo->rss_size == new_tqp_num) >> return 0; >> >> +if (netif_msg_ifdown(h)) >> +netdev_info(netdev, >> +"set channels: tqp_num=%d, rxfh=%d\n", >> +new_tqp_num, rxfh_configured); >> + >> ret = hns3_reset_notify(h, HNAE3_DOWN_CLIENT); >> if (ret) >> return ret; >> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c >> b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c >> index e71c92b..edb9845 100644 >> --- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c >> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c >> @@ -311,6 +311,9 @@ static void hns3_self_test(struct net_device >> *ndev, >> if (eth_test->flags != ETH_TEST_FL_OFFLINE) >> return; >> >> +if (netif_msg_ifdown(h)) >> +netdev_info(ndev, "self test start\n"); >> + >> st_param[HNAE3_LOOP_APP][0] = HNAE3_LOOP_APP; >> st_param[HNAE3_LOOP_APP][1] = >> h->flags & HNAE3_SUPPORT_APP_LOOPBACK; >> @@ -374,6 +377,9 @@ static void hns3_self_test(struct net_device >> *ndev, >> >> if (if_running) >> ndev->netdev_ops->ndo_open(ndev); >> + >> +if (netif_msg_ifdown(h)) >> +netdev_info(ndev, "self test end\n"); >> } >> >> static int hns3_get_sset_count(struct net_device *netdev, int >> stringset) >> @@ -604,6 +610,11 @@ static int hns3_set_pauseparam(struct net_device >> *netdev, >> { >> struct hnae3_handle *h = hns3_get_handle(netdev); >> >> +if (netif_msg_ifdown(h)) >> +netdev_info(netdev, >> +"set pauseparam: autoneg=%d, rx:%d, >> tx:%d\n", >> +param->autoneg, param->rx_pause, param- >>> tx_pause); >> + >> if (h->ae_algo->ops->set_pauseparam) >> return h->ae_algo->ops->set_pauseparam(h, param- >>> autoneg, >> param->rx_pause, >> @@ -743,6 +754,13 @@ static int hns3_set_link_ksettings(struct >> net_device *netdev, >> if (cmd->base.speed == SPEED_1000 && cmd->base.duplex == >> DUPLEX_HALF) >> return -EINVAL; >> >> +if (netif_msg_ifdown(handle)) >> +netdev_info(netdev, >> +"set link(%s): autoneg=%d, speed=%d, >> duplex=%d\n", >> +netdev->phydev ? "phy" : "mac", >> +cmd->base.autoneg, cmd->base.speed, >> +cmd->base.duplex); >> + >> /* Only support ksettings_set for netdev with phy attached for >> now */ >> if (netdev->phydev) >> r
Re: [PATCH net] net: hns: fix LED configuration for marvell phy
On 2019/7/25 21:08, Andrew Lunn wrote: >> You are discussing about the DT configuration, is Matthias Kaehlcke's work >> also provide a generic way to configure PHY LEDS using ACPI? > > In general, you should be able to use the same properties in ACPI as > DT. If the device_property_read_X() API is used, it will try both ACPI > and OF to get the property. > > Andrew > > . > OK, thanks very much!
Re: [PATCH net-next 07/11] net: hns3: adds debug messages to identify eth down cause
We will change all of them to netif_msg_drv() which is default off Thanks for your reply! On 2019/7/26 5:59, Saeed Mahameed wrote: > On Thu, 2019-07-25 at 20:28 +0800, liuyonglong wrote: >> >> On 2019/7/25 3:12, Saeed Mahameed wrote: >>> On Wed, 2019-07-24 at 11:18 +0800, Huazhong Tan wrote: >>>> From: Yonglong Liu >>>> >>>> Some times just see the eth interface have been down/up via >>>> dmesg, but can not know why the eth down. So adds some debug >>>> messages to identify the cause for this. >>>> >>> >>> I really don't like this. your default msg lvl has NETIF_MSG_IFDOWN >>> turned on .. dumping every single operation that happens on your >>> device >>> by default to kernel log is too much ! >>> >>> We should really consider using trace buffers with well defined >>> structures for vendor specific events. so we can use bpf filters >>> and >>> state of the art tools for netdev debugging. >>> >> >> We do this because we can just see a link down message in dmesg, and >> had >> take a long time to found the cause of link down, just because >> another >> user changed the settings. >> >> We can change the net_open/net_stop/dcbnl_ops to msg_drv (not default >> turned on), and want to keep the others default print to kernel log, >> is it acceptable? >> > > acceptable as long as debug information are kept off by default and > your driver doens't spam the kernel log. > > you should use dynamic debug [1] and/or "off by default" msg lvls for > debugging information.. > > I couldn't find any rules regarding what to put in kernel log, Maybe > someone can share ?. but i vaguely remember that the recommendation > for device drivers is to put nothing, only error/warning messages. > > [1] > https://www.kernel.org/doc/html/v4.15/admin-guide/dynamic-debug-howto.html > >>>> @@ -1593,6 +1603,11 @@ static int hns3_ndo_set_vf_vlan(struct >>>> net_device *netdev, int vf, u16 vlan, >>>>struct hnae3_handle *h = hns3_get_handle(netdev); >>>>int ret = -EIO; >>>> >>>> + if (netif_msg_ifdown(h)) >>> >>> why msg_ifdown ? looks like netif_msg_drv is more appropriate, for >>> many >>> of the cases in this patch. >>> >> >> This operation may cause link down, so we use msg_ifdown. >> > > ifdown isn't link down.. > > to be honest, I couldn't find any documentation explaining how/when to > use msg lvls, (i didn't look too deep though), by looking at other > drivers, my interpretations is: > > ifdup (open/boot up flow) > ifdwon (close/teardown flow) > drv (driver based or dynamic flows) > etc .. > > -Saeed. >
Re: [PATCH net-next 07/11] net: hns3: adds debug messages to identify eth down cause
As Saeed said, we will use netif_msg_drv() which is default off, this can be easily open with ethtool. Thanks for your reply! On 2019/7/26 9:28, Jakub Kicinski wrote: > On Thu, 25 Jul 2019 21:59:08 +, Saeed Mahameed wrote: >> I couldn't find any rules regarding what to put in kernel log, Maybe >> someone can share ?. but i vaguely remember that the recommendation >> for device drivers is to put nothing, only error/warning messages. > > FWIW my understanding is also that only error/warning messages should > be printed. IOW things which should "never happen". > > There are some historical exceptions. Probe logs for instance may be > useful, because its not trivial to get to the device if probe fails. > > Another one is ethtool flashing, if it takes time we used to print into > logs some message like "please wait patiently". But since Jiri added > the progress messages in devlink that's no longer necessary. > > For the messages which are basically printing the name of the function > or name of the function and their args - we have ftrace. > > That's my $0.02 :) > > . >
Re: [RFC] net: phy: read link status twice when phy_check_link_status()
On 2019/7/30 4:57, Heiner Kallweit wrote: > On 29.07.2019 05:59, liuyonglong wrote: >> >> >> On 2019/7/27 2:14, Heiner Kallweit wrote: >>> On 26.07.2019 11:53, Yonglong Liu wrote: >>>> According to the datasheet of Marvell phy and Realtek phy, the >>>> copper link status should read twice, or it may get a fake link >>>> up status, and cause up->down->up at the first time when link up. >>>> This happens more oftem at Realtek phy. >>>> >>> This is not correct, there is no fake link up status. >>> Read the comment in genphy_update_link, only link-down events >>> are latched. Means if the first read returns link up, then there >>> is no need for a second read. And in polling mode we don't do a >>> second read because we want to detect also short link drops. >>> >>> It would be helpful if you could describe your actual problem >>> and whether you use polling or interrupt mode. >>> >> >> [ 44.498633] hns3 :bd:00.1 eth5: net open >> [ 44.504273] hns3 :bd:00.1: reg=0x1, data=0x79ad -> called from >> phy_start_aneg >> [ 44.532348] hns3 :bd:00.1: reg=0x1, data=0x798d -> called from >> phy_state_machine,update link. > > This should not happen. The PHY indicates link up w/o having aneg finished. > >> >> According to the datasheet: >> reg 1.5=0 now, means copper auto-negotiation not complete >> reg 1.2=1 now, means link is up >> >> We can see that, when we read the link up, the auto-negotiation >> is not complete yet, so the speed is invalid. >> >> I don't know why this happen, maybe this state is keep from bios? >> Or we may do something else in the phy initialize to fix it? >> And also confuse that why read twice can fix it? >> > I suppose that basically any delay would do. > >> [ 44.554063] hns3 :bd:00.1: invalid speed (-1) >> [ 44.560412] hns3 :bd:00.1 eth5: failed to adjust link. >> [ 45.194870] hns3 :bd:00.1 eth5: link up >> [ 45.574095] hns3 :bd:00.1: phyid=3, reg=0x1, data=0x7989 >> [ 46.150051] hns3 :bd:00.1 eth5: link down >> [ 46.598074] hns3 :bd:00.1: phyid=3, reg=0x1, data=0x7989 >> [ 47.622075] hns3 :bd:00.1: phyid=3, reg=0x1, data=0x79a9 >> [ 48.646077] hns3 :bd:00.1: phyid=3, reg=0x1, data=0x79ad >> [ 48.934050] hns3 :bd:00.1 eth5: link up >> [ 49.702140] hns3 :bd:00.1: phyid=3, reg=0x1, data=0x79ad >> >>>> I add a fake status read, and can solve this problem. >>>> >>>> I also see that in genphy_update_link(), had delete the fake >>>> read in polling mode, so I don't know whether my solution is >>>> correct. >>>> > > Can you test whether the following fixes the issue for you? > Also it would be interesting which exact PHY models you tested > and whether you built the respective PHY drivers or whether you > rely on the genphy driver. Best use the second patch to get the > needed info. It may make sense anyway to add the call to > phy_attached_info() to the hns3 driver. > [ 40.100716] RTL8211F Gigabit Ethernet mii-:bd:00.3:07: attached PHY driver [RTL8211F Gigabit Ethernet] (mii_bus:phy_addr=mii-:bd:00.3:07, irq=POLL) [ 40.932133] hns3 :bd:00.3 eth7: net open [ 40.932458] hns3 :bd:00.3: invalid speed (-1) [ 40.932541] 8021q: adding VLAN 0 to HW filter on device eth7 [ 40.937149] hns3 :bd:00.3 eth7: failed to adjust link. [ 40.662050] Generic PHY mii-:bd:00.2:05: attached PHY driver [Generic PHY] (mii_bus:phy_addr=mii-:bd:00.2:05, irq=POLL) [ 41.563511] hns3 :bd:00.2 eth6: net open [ 41.563853] hns3 :bd:00.2: invalid speed (-1) [ 41.563943] 8021q: adding VLAN 0 to HW filter on device eth6 [ 41.568578] IPv6: ADDRCONF(NETDEV_CHANGE): eth6: link becomes ready [ 41.568898] hns3 :bd:00.2 eth6: failed to adjust link. I am using RTL8211F, but you can see that, both genphy driver and RTL8211F driver have the same issue. > > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c > index 6b5cb87f3..fbecfe210 100644 > --- a/drivers/net/phy/phy_device.c > +++ b/drivers/net/phy/phy_device.c > @@ -1807,7 +1807,8 @@ int genphy_read_status(struct phy_device *phydev) > > linkmode_zero(phydev->lp_advertising); > > - if (phydev->autoneg == AUTONEG_ENABLE && phydev->autoneg_complete) { > + if (phydev->autoneg == AUTONEG_ENABLE && > + (phydev->autoneg_complete || phydev->link)) { > if (phydev->is_gigabit_capable) { > lpagb = phy_read(phydev, MII_STAT1000); > if (lpagb < 0) > I have try this patch, have no effect. I suppose that at this time, the autoneg actually not complete yet. Maybe the wrong phy state is passed from bios? Invalid speed just happen at the first time when ethX up, after that, repeat the ifconfig down/ifconfig up command can not see that again. So I think the bios should power off the phy(writing reg 1.11 to 1) before it starts the OS? Or any other way to fix this in the OS?
Re: [RFC] net: phy: read link status twice when phy_check_link_status()
:bd:00.1 read phy:0x03 reg:0x01 val:0x79ad kworker/u257:0-8 [000] 184.504478: mdio_access: mii-:bd:00.1 read phy:0x03 reg:0x01 val:0x79ad kworker/u257:0-8 [000] 185.528486: mdio_access: mii-:bd:00.1 read phy:0x03 reg:0x01 val:0x79ad kworker/u257:0-8 [000] 186.552475: mdio_access: mii-:bd:00.1 read phy:0x03 reg:0x01 val:0x79ad ifconfig-1327 [011] 187.196036: mdio_access: mii-:bd:00.1 read phy:0x03 reg:0x00 val:0x1040 ifconfig-1327 [011] 187.196046: mdio_access: mii-:bd:00.1 write phy:0x03 reg:0x00 val:0x1840 On 2019/7/30 14:08, Heiner Kallweit wrote: > On 30.07.2019 06:03, liuyonglong wrote: >> >> >> On 2019/7/30 4:57, Heiner Kallweit wrote: >>> On 29.07.2019 05:59, liuyonglong wrote: >>>> >>>> >>>> On 2019/7/27 2:14, Heiner Kallweit wrote: >>>>> On 26.07.2019 11:53, Yonglong Liu wrote: >>>>>> According to the datasheet of Marvell phy and Realtek phy, the >>>>>> copper link status should read twice, or it may get a fake link >>>>>> up status, and cause up->down->up at the first time when link up. >>>>>> This happens more oftem at Realtek phy. >>>>>> >>>>> This is not correct, there is no fake link up status. >>>>> Read the comment in genphy_update_link, only link-down events >>>>> are latched. Means if the first read returns link up, then there >>>>> is no need for a second read. And in polling mode we don't do a >>>>> second read because we want to detect also short link drops. >>>>> >>>>> It would be helpful if you could describe your actual problem >>>>> and whether you use polling or interrupt mode. >>>>> >>>> >>>> [ 44.498633] hns3 :bd:00.1 eth5: net open >>>> [ 44.504273] hns3 :bd:00.1: reg=0x1, data=0x79ad -> called from >>>> phy_start_aneg >>>> [ 44.532348] hns3 :bd:00.1: reg=0x1, data=0x798d -> called from >>>> phy_state_machine,update link. >>> >>> This should not happen. The PHY indicates link up w/o having aneg finished. >>> >>>> >>>> According to the datasheet: >>>> reg 1.5=0 now, means copper auto-negotiation not complete >>>> reg 1.2=1 now, means link is up >>>> >>>> We can see that, when we read the link up, the auto-negotiation >>>> is not complete yet, so the speed is invalid. >>>> >>>> I don't know why this happen, maybe this state is keep from bios? >>>> Or we may do something else in the phy initialize to fix it? >>>> And also confuse that why read twice can fix it? >>>> >>> I suppose that basically any delay would do. >>> >>>> [ 44.554063] hns3 :bd:00.1: invalid speed (-1) >>>> [ 44.560412] hns3 :bd:00.1 eth5: failed to adjust link. >>>> [ 45.194870] hns3 :bd:00.1 eth5: link up >>>> [ 45.574095] hns3 :bd:00.1: phyid=3, reg=0x1, data=0x7989 >>>> [ 46.150051] hns3 :bd:00.1 eth5: link down >>>> [ 46.598074] hns3 :bd:00.1: phyid=3, reg=0x1, data=0x7989 >>>> [ 47.622075] hns3 :bd:00.1: phyid=3, reg=0x1, data=0x79a9 >>>> [ 48.646077] hns3 :bd:00.1: phyid=3, reg=0x1, data=0x79ad >>>> [ 48.934050] hns3 :bd:00.1 eth5: link up >>>> [ 49.702140] hns3 :bd:00.1: phyid=3, reg=0x1, data=0x79ad >>>> >>>>>> I add a fake status read, and can solve this problem. >>>>>> >>>>>> I also see that in genphy_update_link(), had delete the fake >>>>>> read in polling mode, so I don't know whether my solution is >>>>>> correct. >>>>>> >>> >>> Can you test whether the following fixes the issue for you? >>> Also it would be interesting which exact PHY models you tested >>> and whether you built the respective PHY drivers or whether you >>> rely on the genphy driver. Best use the second patch to get the >>> needed info. It may make sense anyway to add the call to >>> phy_attached_info() to the hns3 driver. >>> >> >> [ 40.100716] RTL8211F Gigabit Ethernet mii-:bd:00.3:07: attached PHY >> driver [RTL8211F Gigabit Ethernet] (mii_bus:phy_addr=mii-:bd:00.3:07, >> irq=POLL) >> [ 40.932133] hns3 :bd:00.3 eth7: net open >> [ 40.932458] hns3 :bd:00.3: invalid speed (-1) >> [ 40.93
Re: [RFC] net: phy: read link status twice when phy_check_link_status()
On 2019/7/30 14:35, liuyonglong wrote: > :/sys/kernel/debug/tracing$ cat trace > # tracer: nop > # > # entries-in-buffer/entries-written: 45/45 #P:128 > # > # _-=> irqs-off > # / _=> need-resched > #| / _---=> hardirq/softirq > #|| / _--=> preempt-depth > #||| / delay > # TASK-PID CPU# TIMESTAMP FUNCTION > # | | | | | > kworker/64:2-1028 [064] 172.295687: mdio_access: mii-:bd:00.0 > read phy:0x01 reg:0x02 val:0x001c > kworker/64:2-1028 [064] 172.295726: mdio_access: mii-:bd:00.0 > read phy:0x01 reg:0x03 val:0xc916 > kworker/64:2-1028 [064] 172.296902: mdio_access: mii-:bd:00.0 > read phy:0x01 reg:0x01 val:0x79ad > kworker/64:2-1028 [064] 172.296938: mdio_access: mii-:bd:00.0 > read phy:0x01 reg:0x0f val:0x2000 > kworker/64:2-1028 [064] 172.321213: mdio_access: mii-:bd:00.0 > read phy:0x01 reg:0x00 val:0x1040 > kworker/64:2-1028 [064] 172.343209: mdio_access: mii-:bd:00.1 > read phy:0x03 reg:0x02 val:0x001c > kworker/64:2-1028 [064] 172.343245: mdio_access: mii-:bd:00.1 > read phy:0x03 reg:0x03 val:0xc916 > kworker/64:2-1028 [064] 172.343882: mdio_access: mii-:bd:00.1 > read phy:0x03 reg:0x01 val:0x79ad > kworker/64:2-1028 [064] 172.343918: mdio_access: mii-:bd:00.1 > read phy:0x03 reg:0x0f val:0x2000 > kworker/64:2-1028 [064] 172.362658: mdio_access: mii-:bd:00.1 > read phy:0x03 reg:0x00 val:0x1040 > kworker/64:2-1028 [064] 172.385961: mdio_access: mii-:bd:00.2 > read phy:0x05 reg:0x02 val:0x001c > kworker/64:2-1028 [064] 172.385996: mdio_access: mii-:bd:00.2 > read phy:0x05 reg:0x03 val:0xc916 > kworker/64:2-1028 [064] 172.386646: mdio_access: mii-:bd:00.2 > read phy:0x05 reg:0x01 val:0x79ad > kworker/64:2-1028 [064] 172.386681: mdio_access: mii-:bd:00.2 > read phy:0x05 reg:0x0f val:0x2000 > kworker/64:2-1028 [064] 172.411286: mdio_access: mii-:bd:00.2 > read phy:0x05 reg:0x00 val:0x1040 > kworker/64:2-1028 [064] 172.433225: mdio_access: mii-:bd:00.3 > read phy:0x07 reg:0x02 val:0x001c > kworker/64:2-1028 [064] 172.433260: mdio_access: mii-:bd:00.3 > read phy:0x07 reg:0x03 val:0xc916 > kworker/64:2-1028 [064] 172.433887: mdio_access: mii-:bd:00.3 > read phy:0x07 reg:0x01 val:0x79ad > kworker/64:2-1028 [064] 172.433922: mdio_access: mii-:bd:00.3 > read phy:0x07 reg:0x0f val:0x2000 > kworker/64:2-1028 [064] 172.452862: mdio_access: mii-:bd:00.3 > read phy:0x07 reg:0x00 val:0x1040 > ifconfig-1324 [011] 177.325585: mdio_access: mii-:bd:00.1 > read phy:0x03 reg:0x00 val:0x1040 > kworker/u257:0-8 [012] 177.325642: mdio_access: mii-:bd:00.1 > read phy:0x03 reg:0x04 val:0x01e1 > kworker/u257:0-8 [012] 177.325654: mdio_access: mii-:bd:00.1 > write phy:0x03 reg:0x04 val:0x05e1 > kworker/u257:0-8 [012] 177.325708: mdio_access: mii-:bd:00.1 > read phy:0x03 reg:0x01 val:0x79ad > kworker/u257:0-8 [012] 177.325744: mdio_access: mii-:bd:00.1 > read phy:0x03 reg:0x09 val:0x0200 > kworker/u257:0-8 [012] 177.325779: mdio_access: mii-:bd:00.1 > read phy:0x03 reg:0x00 val:0x1040 > kworker/u257:0-8 [012] 177.325788: mdio_access: mii-:bd:00.1 > write phy:0x03 reg:0x00 val:0x1240 > kworker/u257:0-8 [012] 177.325843: mdio_access: mii-:bd:00.1 > read phy:0x03 reg:0x01 val:0x798d > kworker/u257:0-8 [003] 178.360488: mdio_access: mii-:bd:00.1 > read phy:0x03 reg:0x01 val:0x7989 > kworker/u257:0-8 [000] 179.384479: mdio_access: mii-:bd:00.1 > read phy:0x03 reg:0x01 val:0x7989 > kworker/u257:0-8 [000] 180.408477: mdio_access: mii-:bd:00.1 > read phy:0x03 reg:0x01 val:0x7989 > kworker/u257:0-8 [000] 181.432474: mdio_access: mii-:bd:00.1 > read phy:0x03 reg:0x01 val:0x79a9 > kworker/u257:0-8 [000] 181.432510: mdio_access: mii-:bd:00.1 > read phy:0x03 reg:0x0a val:0x7800 > kworker/u257:0-8 [000] 181.432546: mdio_access: mii-:bd:00.1 > read phy:0x03 reg:0x09 val:0x0200 > kworker/u257:0-8 [000] 181.432582: mdio_access: mii-:bd:00.1 > read phy:0x03 reg:0x05 val:0xc1e1 > kworker/u257:0-8 [000] 182.456510: mdio_access: mii-
Re: [RFC] net: phy: read link status twice when phy_check_link_status()
On 2019/7/31 3:04, Heiner Kallweit wrote: > On 30.07.2019 08:35, liuyonglong wrote: >> :/sys/kernel/debug/tracing$ cat trace >> # tracer: nop >> # >> # entries-in-buffer/entries-written: 45/45 #P:128 >> # >> # _-=> irqs-off >> # / _=> need-resched >> #| / _---=> hardirq/softirq >> #|| / _--=> preempt-depth >> #||| / delay >> # TASK-PID CPU# TIMESTAMP FUNCTION >> # | | | | | >> kworker/64:2-1028 [064] 172.295687: mdio_access: >> mii-:bd:00.0 read phy:0x01 reg:0x02 val:0x001c >> kworker/64:2-1028 [064] 172.295726: mdio_access: >> mii-:bd:00.0 read phy:0x01 reg:0x03 val:0xc916 >> kworker/64:2-1028 [064] 172.296902: mdio_access: >> mii-:bd:00.0 read phy:0x01 reg:0x01 val:0x79ad >> kworker/64:2-1028 [064] 172.296938: mdio_access: >> mii-:bd:00.0 read phy:0x01 reg:0x0f val:0x2000 >> kworker/64:2-1028 [064] 172.321213: mdio_access: >> mii-:bd:00.0 read phy:0x01 reg:0x00 val:0x1040 >> kworker/64:2-1028 [064] 172.343209: mdio_access: >> mii-:bd:00.1 read phy:0x03 reg:0x02 val:0x001c >> kworker/64:2-1028 [064] 172.343245: mdio_access: >> mii-:bd:00.1 read phy:0x03 reg:0x03 val:0xc916 >> kworker/64:2-1028 [064] 172.343882: mdio_access: >> mii-:bd:00.1 read phy:0x03 reg:0x01 val:0x79ad >> kworker/64:2-1028 [064] 172.343918: mdio_access: >> mii-:bd:00.1 read phy:0x03 reg:0x0f val:0x2000 >> kworker/64:2-1028 [064] 172.362658: mdio_access: >> mii-:bd:00.1 read phy:0x03 reg:0x00 val:0x1040 >> kworker/64:2-1028 [064] 172.385961: mdio_access: >> mii-:bd:00.2 read phy:0x05 reg:0x02 val:0x001c >> kworker/64:2-1028 [064] 172.385996: mdio_access: >> mii-:bd:00.2 read phy:0x05 reg:0x03 val:0xc916 >> kworker/64:2-1028 [064] 172.386646: mdio_access: >> mii-:bd:00.2 read phy:0x05 reg:0x01 val:0x79ad >> kworker/64:2-1028 [064] 172.386681: mdio_access: >> mii-:bd:00.2 read phy:0x05 reg:0x0f val:0x2000 >> kworker/64:2-1028 [064] 172.411286: mdio_access: >> mii-:bd:00.2 read phy:0x05 reg:0x00 val:0x1040 >> kworker/64:2-1028 [064] 172.433225: mdio_access: >> mii-:bd:00.3 read phy:0x07 reg:0x02 val:0x001c >> kworker/64:2-1028 [064] 172.433260: mdio_access: >> mii-:bd:00.3 read phy:0x07 reg:0x03 val:0xc916 >> kworker/64:2-1028 [064] 172.433887: mdio_access: >> mii-:bd:00.3 read phy:0x07 reg:0x01 val:0x79ad >> kworker/64:2-1028 [064] 172.433922: mdio_access: >> mii-:bd:00.3 read phy:0x07 reg:0x0f val:0x2000 >> kworker/64:2-1028 [064] 172.452862: mdio_access: >> mii-:bd:00.3 read phy:0x07 reg:0x00 val:0x1040 >> ifconfig-1324 [011] 177.325585: mdio_access: >> mii-:bd:00.1 read phy:0x03 reg:0x00 val:0x1040 >> kworker/u257:0-8 [012] 177.325642: mdio_access: >> mii-:bd:00.1 read phy:0x03 reg:0x04 val:0x01e1 >> kworker/u257:0-8 [012] 177.325654: mdio_access: >> mii-:bd:00.1 write phy:0x03 reg:0x04 val:0x05e1 >> kworker/u257:0-8 [012] 177.325708: mdio_access: >> mii-:bd:00.1 read phy:0x03 reg:0x01 val:0x79ad >> kworker/u257:0-8 [012] 177.325744: mdio_access: >> mii-:bd:00.1 read phy:0x03 reg:0x09 val:0x0200 >> kworker/u257:0-8 [012] 177.325779: mdio_access: >> mii-:bd:00.1 read phy:0x03 reg:0x00 val:0x1040 >> kworker/u257:0-8 [012] 177.325788: mdio_access: >> mii-:bd:00.1 write phy:0x03 reg:0x00 val:0x1240 >> kworker/u257:0-8 [012] 177.325843: mdio_access: >> mii-:bd:00.1 read phy:0x03 reg:0x01 val:0x798d > > What I think that happens here: > Writing 0x1240 to BMCR starts aneg. When reading BMSR immediately after that > then the PHY seems to have cleared > the "aneg complete" bit already, but not yet the "link up" bit. This results > in the false "link up" notification. > The following patch is based on the fact that in case of enabled aneg we > can't have a valid link if aneg isn't > finished. Could you please test whether this works for you? > > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c > in
Re: [RFC] net: phy: read link status twice when phy_check_link_status()
On 2019/7/31 13:44, Heiner Kallweit wrote: > On 31.07.2019 05:33, liuyonglong wrote: >> >> >> On 2019/7/31 3:04, Heiner Kallweit wrote: >>> On 30.07.2019 08:35, liuyonglong wrote: >>>> :/sys/kernel/debug/tracing$ cat trace >>>> # tracer: nop >>>> # >>>> # entries-in-buffer/entries-written: 45/45 #P:128 >>>> # >>>> # _-=> irqs-off >>>> # / _=> need-resched >>>> #| / _---=> hardirq/softirq >>>> #|| / _--=> preempt-depth >>>> #||| / delay >>>> # TASK-PID CPU# TIMESTAMP FUNCTION >>>> # | | | | | >>>> kworker/64:2-1028 [064] 172.295687: mdio_access: >>>> mii-:bd:00.0 read phy:0x01 reg:0x02 val:0x001c >>>> kworker/64:2-1028 [064] 172.295726: mdio_access: >>>> mii-:bd:00.0 read phy:0x01 reg:0x03 val:0xc916 >>>> kworker/64:2-1028 [064] 172.296902: mdio_access: >>>> mii-:bd:00.0 read phy:0x01 reg:0x01 val:0x79ad >>>> kworker/64:2-1028 [064] 172.296938: mdio_access: >>>> mii-:bd:00.0 read phy:0x01 reg:0x0f val:0x2000 >>>> kworker/64:2-1028 [064] 172.321213: mdio_access: >>>> mii-:bd:00.0 read phy:0x01 reg:0x00 val:0x1040 >>>> kworker/64:2-1028 [064] 172.343209: mdio_access: >>>> mii-:bd:00.1 read phy:0x03 reg:0x02 val:0x001c >>>> kworker/64:2-1028 [064] 172.343245: mdio_access: >>>> mii-:bd:00.1 read phy:0x03 reg:0x03 val:0xc916 >>>> kworker/64:2-1028 [064] 172.343882: mdio_access: >>>> mii-:bd:00.1 read phy:0x03 reg:0x01 val:0x79ad >>>> kworker/64:2-1028 [064] 172.343918: mdio_access: >>>> mii-:bd:00.1 read phy:0x03 reg:0x0f val:0x2000 >>>> kworker/64:2-1028 [064] 172.362658: mdio_access: >>>> mii-:bd:00.1 read phy:0x03 reg:0x00 val:0x1040 >>>> kworker/64:2-1028 [064] 172.385961: mdio_access: >>>> mii-:bd:00.2 read phy:0x05 reg:0x02 val:0x001c >>>> kworker/64:2-1028 [064] 172.385996: mdio_access: >>>> mii-:bd:00.2 read phy:0x05 reg:0x03 val:0xc916 >>>> kworker/64:2-1028 [064] 172.386646: mdio_access: >>>> mii-:bd:00.2 read phy:0x05 reg:0x01 val:0x79ad >>>> kworker/64:2-1028 [064] 172.386681: mdio_access: >>>> mii-:bd:00.2 read phy:0x05 reg:0x0f val:0x2000 >>>> kworker/64:2-1028 [064] 172.411286: mdio_access: >>>> mii-:bd:00.2 read phy:0x05 reg:0x00 val:0x1040 >>>> kworker/64:2-1028 [064] 172.433225: mdio_access: >>>> mii-:bd:00.3 read phy:0x07 reg:0x02 val:0x001c >>>> kworker/64:2-1028 [064] 172.433260: mdio_access: >>>> mii-:bd:00.3 read phy:0x07 reg:0x03 val:0xc916 >>>> kworker/64:2-1028 [064] 172.433887: mdio_access: >>>> mii-:bd:00.3 read phy:0x07 reg:0x01 val:0x79ad >>>> kworker/64:2-1028 [064] 172.433922: mdio_access: >>>> mii-:bd:00.3 read phy:0x07 reg:0x0f val:0x2000 >>>> kworker/64:2-1028 [064] 172.452862: mdio_access: >>>> mii-:bd:00.3 read phy:0x07 reg:0x00 val:0x1040 >>>> ifconfig-1324 [011] 177.325585: mdio_access: >>>> mii-:bd:00.1 read phy:0x03 reg:0x00 val:0x1040 >>>> kworker/u257:0-8 [012] 177.325642: mdio_access: >>>> mii-:bd:00.1 read phy:0x03 reg:0x04 val:0x01e1 >>>> kworker/u257:0-8 [012] 177.325654: mdio_access: >>>> mii-:bd:00.1 write phy:0x03 reg:0x04 val:0x05e1 >>>> kworker/u257:0-8 [012] 177.325708: mdio_access: >>>> mii-:bd:00.1 read phy:0x03 reg:0x01 val:0x79ad >>>> kworker/u257:0-8 [012] 177.325744: mdio_access: >>>> mii-:bd:00.1 read phy:0x03 reg:0x09 val:0x0200 >>>> kworker/u257:0-8 [012] 177.325779: mdio_access: >>>> mii-:bd:00.1 read phy:0x03 reg:0x00 val:0x1040 >>>> kworker/u257:0-8 [012] 177.325788: mdio_access: >>>> mii-:bd:00.1 write phy:0x03 reg:0x00 val:0x1240 >>>> kworker/u257:0-8 [012] 177.325843: mdio_access: >&