Thanks for all the feedback on polling, Jack and others.  Very helpful.

We are working to merge the latest RELENG_8 em/igb driver into our custom build that's based on RELENG_8_1. I've been able to create a patch using the following command:

cvs di -N -up -jRELENG_8_1 -jRELENG_8 sys/dev/e1000 sys/dev/ixgb sys/dev/ixgbe sys/conf/files > /tmp/e1000.diff

... by hand trimming sys/conf/files down to only the relevant bits. It compiled and seems to be functioning, but I wouldn't mind a sanity check on my methodology. In particular:

   * Some of the patches overlapped with sys/dev/ixgb, igbe... so I
     included them.  Should I have?
   * Is there anything else I should have included?


Thanks very much,

Charles


On 1/13/11 4:49 PM, Jack Vogel wrote:
Polling has seemed to me to be a way around other problems, problems that these days no longer exist. I remember back in the FreeBSD 6 days having interrupt problems which of course also led to watchdogs. Polling got rid of that. But now there are dedicated
MULTIPLE interrupts by using MSIX, so that reason for polling is gone.

Of course there can still be advantages, reducing interrupts and hence context switches,
which is why the Linux approach does what it does.

I have not spent time with that issue, its good to know that there could be problems lurking with it. But if you can simply go with MSIX I would do that for now.

Jack


On Thu, Jan 13, 2011 at 1:42 PM, Charles Owens <cow...@greatbaysoftware.com <mailto:cow...@greatbaysoftware.com>> wrote:

    So we went back to basics (stock 8.1-RELEASE) and found no
    issue!    We then added in our kernel mods one by one and
    ultimately discovered that device-polling is the culprit (the
    kernel config was simply GENERIC + PAE + polling).

    Immediately upon running "ifconfig igb0 polling" the symptoms appear.

    This is very good news overall, in that we can certainly disable
    polling for igb.  This begs the question, though, as to whether
    polling is recommended these days at all for em/igb NICs... or
    even in general.  From other conversations we've seen there seems
    to be some general debate about this.  In testing we've done in
    the past (circa 7.0) there certainly seemed to be benefit to using
    this feature.  What are your thoughts about this?

    For our product releases we'd like stay with RELENG_8_1.  Would
    you recommend the driver in 8.2 as being preferable?

    In case it's of interest:

    igb0@pci0:1:0:0:        class=0x020000 card=0x34de8086 chip=0x10a78086 
rev=0x02
    hdr=0x00
         vendor     = 'Intel Corporation'    device     = '82575EB Gigabit 
Network Connection'
         class      = network
         subclass   = ethernet



    Thanks,
    Charles



    On 1/13/11 1:27 PM, Jack Vogel wrote:
    The 8.2 latest does have the latest igb, so using that should be
    indicative...

    Jack


    On Thu, Jan 13, 2011 at 7:56 AM, Charles Owens
    <cow...@greatbaysoftware.com
    <mailto:cow...@greatbaysoftware.com>> wrote:

        Ok... I got my wires crossed:  our first time testing 8.1 on
        this particular platform was with a kernel that had ichwd
        enabled (a new thing for us) and so when igb started
        complaining about "watchdog" we thought it was related.

        We've tested again and clearly the real story is that we're
        simply seeing igb issues, symptoms similar to those described.

        Does 8.2-RC1 have sufficiently "latest" code, or should I be
        looking to load up something else?  (8-stable, maybe?)

        Thanks,
        Charles



        On 1/13/11 12:07 AM, Jack Vogel wrote:
        The problem that Robin saw was due to having MSIX interrupts
        disabled on the system, I doubt that
        is going to be the "issue" for others.

        Get the latest version of the igb code and see if that helps
        you as a first step.

        Jack


        On Wed, Jan 12, 2011 at 6:43 PM, Charles Owens
        <cow...@greatbaysoftware.com
        <mailto:cow...@greatbaysoftware.com>> wrote:

            I'd like to report that we're running into this issue
            also, in our case on systems that are based on the Intel
            S5520UR Server Board, running 8.1-RELEASE.  If the ichwd
            driver is loaded we see the same messages, and network
            communication via the igb nics is non-functional.

            Have you had any luck?

            Thanks,
            Charles

             Charles Owens
             Great Bay Software, Inc.




            On 1/3/11 4:02 PM, Robin Sommer wrote:

                Hello all,

                quite a while ago I asked about the problem below.
                Unfortunately, I
                haven't found a solution yet and I'm actually still
                seeing these
                timeouts after just upgrading to 8.2-RC1. Any
                further ideas on what
                could be triggering them, or how I could track down
                the cause?

                Thanks,

                Robin

                On Thu, Jul 29, 2010 at 14:56 -0700, I wrote:

                    Since upgrading from 8.0 to 8.1-RELEASE, I'm
                    seeing lots of messages
                    like those below on all my SuperMicro
                    SBI-7425C-T3 blades. There's
                    almost no traffic on those interfaces.

                    Any idea?

                    Thanks,

                    Robin

                    Jul 29 13:01:18 blade0 kernel: igb1: Watchdog
                    timeout -- resetting
                    Jul 29 13:01:18 blade0 kernel: igb1: Queue(0)
                    tdh = 256, hw tdt = 266
                    Jul 29 13:01:18 blade0 kernel: igb1: TX(0) desc
                    avail = 1013,Next TX to Clean = 255
                    Jul 29 13:01:18 blade0 kernel: igb1: link state
                    changed to DOWN
                    Jul 29 13:01:18 blade0 kernel: igb1: link state
                    changed to UP
                    Jul 29 13:01:29 blade0 kernel: igb1: Watchdog
                    timeout -- resetting
                    Jul 29 13:01:29 blade0 kernel: igb1: Queue(0)
                    tdh = 0, hw tdt = 10
                    Jul 29 13:01:29 blade0 kernel: igb1: TX(0) desc
                    avail = 1014,Next TX to Clean = 0
                    Jul 29 13:01:29 blade0 kernel: igb1: link state
                    changed to DOWN
                    Jul 29 13:01:29 blade0 kernel: igb1: link state
                    changed to UP
                    Jul 29 13:01:46 blade0 kernel: igb1: Watchdog
                    timeout -- resetting
                    Jul 29 13:01:46 blade0 kernel: igb1: Queue(0)
                    tdh = 32, hw tdt = 33
                    Jul 29 13:01:46 blade0 kernel: igb1: TX(0) desc
                    avail = 1022,Next TX to Clean = 31
                    Jul 29 13:01:46 blade0 kernel: igb1: link state
                    changed to DOWN
                    Jul 29 13:01:46 blade0 kernel: igb1: link state
                    changed to UP
                    Jul 29 13:01:57 blade0 kernel: igb1: Watchdog
                    timeout -- resetting
                    Jul 29 13:01:57 blade0 kernel: igb1: Queue(0)
                    tdh = 0, hw tdt = 10
                    Jul 29 13:01:57 blade0 kernel: igb1: TX(0) desc
                    avail = 1014,Next TX to Clean = 0
                    Jul 29 13:01:57 blade0 kernel: igb1: link state
                    changed to DOWN
                    Jul 29 13:01:58 blade0 kernel: igb1: link state
                    changed to UP
                    Jul 29 13:02:13 blade0 kernel: igb1: Watchdog
                    timeout -- resetting

                        grep igb /var/run/dmesg.boot

                    igb0:<Intel(R) PRO/1000 Network Connection
                    version - 1.9.5>  port 0x2000-0x201f mem
                    
0xfc940000-0xfc95ffff,0xfc920000-0xfc93ffff,0xfc900000-0xfc903fff
                    irq 16 at device 0.0 on pci4
                    igb0: [FILTER]
                    igb0: Ethernet address: 00:30:48:9e:22:00
                    igb1:<Intel(R) PRO/1000 Network Connection
                    version - 1.9.5>  port 0x2020-0x203f mem
                    
0xfc980000-0xfc99ffff,0xfc960000-0xfc97ffff,0xfc904000-0xfc907fff
                    irq 17 at device 0.1 on pci4
                    igb1: [FILTER]
                    igb1: Ethernet address: 00:30:48:9e:22:01

                        pciconf -lv

                    [...]
                    igb0@pci0:4:0:0: class=0x020000 card=0x10a915d9
                    chip=0x10a98086 rev=0x02 hdr=0x00
                        vendor     = 'Intel Corporation'
                        device     = '82575EB Gigabit Backplane
                    Connection'
                        class      = network
                        subclass   = ethernet
                    igb1@pci0:4:0:1:        class=0x020000
                    card=0x10a915d9
                    chip=0x10a98086 rev=0x02 hdr=0x00
                        vendor     = 'Intel Corporation'
                        device     = '82575EB Gigabit Backplane
                    Connection'
                        class      = network
                        subclass   = ethernet
                    [...]


            _______________________________________________
            freebsd-net@freebsd.org <mailto:freebsd-net@freebsd.org>
            mailing list
            http://lists.freebsd.org/mailman/listinfo/freebsd-net
            To unsubscribe, send any mail to
            "freebsd-net-unsubscr...@freebsd.org
            <mailto:freebsd-net-unsubscr...@freebsd.org>"




_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Reply via email to