Hi,

On 11-10-12 03:27 PM, YongHyeon PYUN wrote:
On Wed, Oct 12, 2011 at 02:35:23PM -0400, Karim wrote:
Hi,
On 11-10-12 01:03 PM, YongHyeon PYUN wrote:
On Wed, Oct 12, 2011 at 10:07:02AM -0400, Karim wrote:
[...]

Hmm, that indicates driver lost established link. msk(4) will
detect this condition and stop RX/TX MACs until it knows PHY
re-established a link. This may be the reason why you see occasional
packet drops. However I don't know why PHY loses established link
in the middle of working.

Yes, I am convinced this lost of link is related to the packet drops as
well. At this point we can safely discard cabling issues or router
issues (physical ones that is) since the same happens on a different
network with different cables.
 From the code in e1000phy_status:

static void
e1000phy_status(struct mii_softc *sc)
{
     struct mii_data *mii = sc->mii_pdata;
     int bmcr, bmsr, ssr;

     mii->mii_media_status = IFM_AVALID;
     mii->mii_media_active = IFM_ETHER;

     bmsr = PHY_READ(sc, E1000_SR) | PHY_READ(sc, E1000_SR);
     bmcr = PHY_READ(sc, E1000_CR);
     ssr = PHY_READ(sc, E1000_SSR);

     if (bmsr&   E1000_SR_LINK_STATUS)
         mii->mii_media_status |= IFM_ACTIVE;


I can see the bmsr&   E1000_SR_LINK_STATUS check failing when the problem
occurs. As a side note why are we ORing the same call twice isn't the
same thing as calling it once:

bmsr = PHY_READ(sc, E1000_SR) | PHY_READ(sc, E1000_SR);

The E1000_SR_LINK_STATUS bit is latched low so it should be read
twice. If you want to read once use E1000_SSR_LINK bit of
E1000_SSR register but I remember that bit was not reliable on some
PHY models.
Thanks for the explanation and the alternative. The ssr register seems
to give me the right bit (E1000_SSR_LINK) but it also gives me an extra
bit 0x0100 that is not defined in e1000phyreg.h. Any idea what that bit
would be/means?

I guess it's related with advanced power saving. It would indicate
current Energy detect status in PHY POV.
Generally Marvell's PHY will enter into automatic power saving mode
when it does not see any energy signal on the link. I don't know
exact time when it enters into that mode but it would take less
than 10 seconds if PHY do not see energy signal from link partner
once it initiated auto-negotiation.
However, e1000phy(4) always disables energy detect feature in
e1000phy_reset() so it wouldn't affect your issue, I guess.

One interesting thing is that 0x100 of E1000_SSR register indicates
energy detect status is in "Sleep mode" which means it didn't
detect energy signal(i.e. lost link). I'm not sure whether this bit
report correct status when energy detect feature is disabled
though.

Can you check whether your switch supports energy detect feature?
Or if your switch support EEE feature, try disabling it.

By chance, does your back-ported driver include r222219?
If yes, did you cold boot after applying the change?
Warm boot does have effect.
I do have this patch in the back-ported driver and due to several
reasons I didn't cold boot the appliance. We will give that a try and see.

Ok, let me know whether that makes any difference or not.

To be more precises I have included msk patches up to r222516.

Thanks!
[...]
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
After a weekend of test I can confirm the problem is gone with the back ported msk driver from FreeBSD 9 and a little bit of patching.

Apart from the packet drops I also had various report from my snmp trap daemon. It was reporting the interface was going inactive and for a while I though the packet drop and inactivity reports were linked. It turned out there was a small race condition between the various polling components in msk_mediastatus() that was confusing the snmp daemon while the packet drops got solved by the back port. The race can be easily solved with the following patch:


@@ -995,9 +996,11 @@ msk_mediastatus(struct ifnet *ifp, struct ifmediareq *ifmr)
        mii = device_get_softc(sc_if->msk_miibus);

        mii_pollstat(mii);
-       MSK_IF_UNLOCK(sc_if);
+
        ifmr->ifm_active = mii->mii_media_active;
        ifmr->ifm_status = mii->mii_media_status;
+
+       MSK_IF_UNLOCK(sc_if);
 }

Without moving down the msk lock its possible for one thread to see its mii_media_status reset to IFM_AVALID in e1000phy_status() right before the assignment to ifmr->ifm_status. This resulted in false reports about interface inactivity in rare occasions between a kernel based probe and the snmp trap daemon.

Thanks to everyone that chipped in to help,

Karim.
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Reply via email to