Re: Debugging Ethernet issues

Sebastian Frias Mon, 14 Nov 2016 05:05:41 -0800

On 11/13/2016 08:55 PM, Florian Fainelli wrote:
> Le 13/11/2016 à 11:51, Mason a écrit :
>> On 13/11/2016 04:09, Andrew Lunn wrote:
>>
>>> Mason wrote:
>>>
>>>> When connected to a Gigabit switch
>>>> 3.4 negotiates a LAN DHCP setup instantly
>>>> 4.7 requires over 5 seconds to do so
>>>
>>> When you run tcpdump on the DHCP server, are you noticing the first
>>> request is missing?
>>>
>>> What can happen is the dhclient gets started immediately and sends out
>>> its first request before auto-negotiation has finished. So this first packet
>>> gets lost. The retransmit after a few seconds is then successful.
>>
>> I will run tcpdump on the server as I run udhcpc on the client
>> for Linux 3.4 vs 4.7
>>
>> Do you know what would make auto-negotiation fail at 100 Mbps
>> on 4.7? (whereas it succeeds on 3.4)
>>
>> (Thinking out loud) If the problem were in auto-negotiation,
>> then if should work if I hard-code speed and duplex using
>> ethtool, right? (IIRC, hard-coding doesn't help.)
> 
> I would start with checking basic things:
> 
> - does your Ethernet driver get a link UP being reported correctly
> (netif_carrier_ok returns 1)?
> - if you let the bootloader configure the PHY and utilize the Generic
> PHY driver instead of the Atheros PHY driver, does the problem appear as
> well?


Would using a "fixed-link" serve the same?

It appears that using a fixed-link

&eth0 {
        #address-cells = <1>;
        #size-cells = <0>;

#ifdef WITH_FIXED_LINK
        phy-connection-type = "rgmii";

        fixed-link {
                   speed = <100>;
                   full-duplex;
        };
#else
        phy-connection-type = "rgmii";
        phy-handle = <&eth0_phy>;

        /* Atheros AR8035 */
        eth0_phy: ethernet-phy@4 {
                interrupt-parent = <&irq0>;
                compatible = "ethernet-phy-id004d.d072",
                             "ethernet-phy-ieee802.3-c22";
                interrupts = <37 IRQ_TYPE_EDGE_RISING>;
                reg = <4>;
        };
#endif
};

works.


----

For what is worth, the patch that Mason was talking about earlier
in the thread:

  "...After much hair-pulling, it turned out that *some* of the breakage
was caused by a local patch..."

was setting changing the following delay in 
'drivers/net/phy/phy.c:phy_state_machine()'

        /* Only re-schedule a PHY state machine change if we are polling the
         * PHY, if PHY_IGNORE_INTERRUPT is set, then we will be moving
         * between states from phy_mac_interrupt()
         */
        if (phydev->irq == PHY_POLL)
                queue_delayed_work(system_power_efficient_wq, 
&phydev->state_queue,
                                   PHY_STATE_TIME * HZ);

from "PHY_STATE_TIME * HZ" to "0".

That caused 2 of 3 types of boards to fail, while one of them always worked
regardless of the delay.

In a nutshell:
- Board A, chip X: works with delay "PHY_STATE_TIME * HZ" or "0".
- Board B, chip X: does not work with delay "0"
- Board C, chip Y: does not work with delay "0"

Does board A works by "luck" when this delay is "0"?
(this delay has always been there, but it is not clear why)

> - what do transmit/receive counters on the Ethernet driver/MAC return?
>

Re: Debugging Ethernet issues

Reply via email to