ARP problem with 6.2-STABLE Intel PRO/1000 NIC, latest em driver

2007-03-04 Thread Mark Costlow
The Machine:

I have a dual Xeon 5130 machine, Supermicro motherboard, with
the 82563EB NIC.  From dmesg:

CPU: Intel(R) Xeon(R) CPU5130  @ 2.00GHz (2000.08-MHz 686-class CPU)
cpu0:  on acpi0
em0:  port 0x2000-0x201f 
mem 0xda00-0xda01 irq 18 at device 0.0 on pci4

The machine has 4G RAM and a 3ware 9000 series RAID controller with 2 drives.

pciconf -l says:

[EMAIL PROTECTED]:0:0:   class=0x02 card=0x15d9 chip=0x10968086 
rev=0x01 hdr=0x00
[EMAIL PROTECTED]:0:1:   class=0x02 card=0x15d9 chip=0x10968086 
rev=0x01 hdr=0x00


The symptom:

The machine boots OK, but can only intermittently make netork connections.
Eventually determined that it seems to only see a few ARP packets, so
it's falling out of other machines' ARP tables, and is often unable to
see the replies to its own ARP requests.  It does see SOME ARPs
though.  When it is able to communicate with another machine, it
does not appear to drop any packets between them (e.g. I scp'd a 500M file
at 300Mbps to this machine).

When I run "tcpdump -n arp" I see a few ARPs, but not many.  In a 1-minute
period, I saw 3 ARP who-has/reply packets.  On a different machine on
the same ethernet switch, I saw 225 who-has/reply packets in the same
1-minute period.

I've tried different cables, and a different switch.  I started with
6.2-RELEASE, and then went to 6.2-STABLE on 3/3/07 to get the latest
em driver fixes.  I've used SMP and GENERIC kernels.  I get the same
results in all cases.

There are no firewall rules installed.

I plugged in a USB ethernet adapter (realtek), and it works straight away.
"tcpdump -n arp" sees the same noise as other machines on that LAN.

I read through the recent threads on the em driver, but didn't see any
reported symptoms like this.  Has anyone seen anything like this?  Got
any hints for me?  Am I doing something stupid?  Did I leave out any
useful information about my configuration?

Thanks,

Mark
-- 
Mark Costlow| Southwest Cyberport | Fax:   +1-505-232-7975
[EMAIL PROTECTED] | Web:   www.swcp.com | Voice: +1-505-232-7992

abq-strange.com -- Interesting photos taken in Albuquerque, NM
   Last post: Art Is OK...And Dangerous - 2007-03-02 10:27:17

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: ARP problem with 6.2-STABLE Intel PRO/1000 NIC, latest em driver

2007-03-05 Thread Mark Costlow
On Sun, Mar 04, 2007 at 11:37:01PM -0800, Jack Vogel wrote:
> 
> These are one of our latest NICs, I have had no trouble with these
> but I'm used to using them on an Intel design, not SuperMicro.
> 
> First question, do you get the same behavior on both ports?
> My first guess is that this is a BIOS/management problem.
> 
> Double check SM website and see if there's any support updates
> to firmware for the system.

I left out a couple of things.  Yes, it does the same thing on both
em0 and em1.  And, the inhouse linux advocate loaded debian on the
box and that worked as expected.

I'll check SM's web site for BIOS updates today.

Mark
-- 
Mark Costlow| Southwest Cyberport | Fax:   +1-505-232-7975
[EMAIL PROTECTED] | Web:   www.swcp.com | Voice: +1-505-232-7992

abq-strange.com -- Interesting photos taken in Albuquerque, NM
   Last post: Art Is OK...And Dangerous - 2007-03-02 10:27:17

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: ARP problem with 6.2-STABLE Intel PRO/1000 NIC, latest em driver

2007-03-05 Thread Mark Costlow
SBKR100 USB 10/100 LAN, rev 1.10/1.00, addr 2
miibus0:  on rue0
ruephy0:  on miibus0
ruephy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
rue0: Ethernet address: 00:10:60:dd:ed:e9
rue0: if_start running deferred for Giant
Timecounter "TSC" frequency 278406 Hz quality 800
Timecounters tick every 1.000 msec
acd0: CDRW  at ata0-master UDMA33
da0 at twa0 bus 0 target 0 lun 0
da0:  Fixed Direct Access SCSI-3 device 
da0: 100.000MB/s transfers
da0: 238408MB (488259584 512 byte sectors: 255H 63S/T 30392C)
da1 at twa0 bus 0 target 1 lun 0
da1:  Fixed Direct Access SCSI-3 device 
da1: 100.000MB/s transfers
da1: 238408MB (488259584 512 byte sectors: 255H 63S/T 30392C)
Trying to mount root from ufs:/dev/da0s1a
em0: link state changed to UP
em0: promiscuous mode enabled
em0: promiscuous mode disabled
twa0: INFO: (0x04: 0x0029): Verify started: unit=0
twa0: INFO: (0x04: 0x002B): Verify completed: unit=0


This is while booting GENERIC.  I can boot SMP and send that too if you
suggest.

Here's vmstat -i:

interrupt  total   rate
irq1: atkbd0   2  0
irq6: fdc0 3  0
irq14: ata0   47  0
irq16: uhci3   14836  0
irq17: uhci0 ehci025  0
irq18: em0 uhci2   91850  2
irq24: twa014828  0
cpu0: timer 79015190   1999
Total   79136781   2003

Is the fact that em0 and uhci2 are sharing an interrupt significant?

Thanks,

Mark
-- 
Mark Costlow| Southwest Cyberport | Fax:   +1-505-232-7975
[EMAIL PROTECTED] | Web:   www.swcp.com | Voice: +1-505-232-7992

abq-strange.com -- Interesting photos taken in Albuquerque, NM
   Last post: Art Is OK...And Dangerous - 2007-03-02 10:27:17

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: ARP problem with 6.2-STABLE Intel PRO/1000 NIC, latest em driver

2007-03-05 Thread Mark Costlow
On Mon, Mar 05, 2007 at 10:02:26AM -0800, Jack Vogel wrote:
> On 3/5/07, Jack Vogel <[EMAIL PROTECTED]> wrote:
> >On 3/5/07, Mark Costlow <[EMAIL PROTECTED]> wrote:
> >> On Mon, Mar 05, 2007 at 08:41:01AM -0800, Jack Vogel wrote:
> >> > >
> >> > >Maybe more of your dmesg might help as it could show interrrupt issues
> >> > >that perhaps others could help diagnose
> >> >
> >> > Yes, agreed, this might be revealing.
> >>
> >> Here's the full dmesg.  Thanks for looking at this.
> >>
> >> 
> >> Copyright (c) 1992-2007 The FreeBSD Project.
> >> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
> >> The Regents of the University of California. All rights reserved.
> >> FreeBSD is a registered trademark of The FreeBSD Foundation.
> >> FreeBSD 6.2-STABLE #0: Sun Mar  4 22:40:38 MST 2007
> >> [EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC
> >> ACPI APIC Table: 
> >> Timecounter "i8254" frequency 1193182 Hz quality 0
> >> CPU: Intel(R) Xeon(R) CPU5130  @ 2.00GHz (2000.08-MHz 
> >686-class CPU)
> >>   Origin = "GenuineIntel"  Id = 0x6f6  Stepping = 6
> >>   
> >Features=0xbfebfbff >> MOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
> >>   
> >Features2=0x4e33d,CX16,,,>
> >>   AMD Features=0x2000
> >>   AMD Features2=0x1
> >>   Cores per package: 2
> >> real memory  = 3489005568 (3327 MB)
> >> avail memory = 3414384640 (3256 MB)
> >> ioapic0  irqs 0-23 on motherboard
> >> ioapic1  irqs 24-47 on motherboard
> >> kbd1 at kbdmux0
> >> ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, 
> >RF5413)
> >> acpi0:  on motherboard
> >> acpi0: Power Button (fixed)
> >> Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
> >> acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0
> >> cpu0:  on acpi0
> >> acpi_throttle0:  on cpu0
> >> pcib0:  port 0xcf8-0xcff on acpi0
> >> pci0:  on pcib0
> >> pcib1:  at device 2.0 on pci0
> >> pci1:  on pcib1
> >> pcib2:  irq 16 at device 0.0 on pci1
> >> pci2:  on pcib2
> >> pcib3:  irq 16 at device 0.0 on pci2
> >> pci3:  on pcib3
> >> pcib4:  irq 18 at device 2.0 on pci2
> >> pci4:  on pcib4
> >> em0:  port 
> >0x2000-0x201f m
> >> em 0xda00-0xda01 irq 18 at device 0.0 on pci4
> >> em0: Ethernet address: 00:30:48:8c:71:54
> >> em1:  port 
> >0x2020-0x203f m
> >> em 0xda02-0xda03 irq 19 at device 0.1 on pci4
> >> em1: Ethernet address: 00:30:48:8c:71:55
> >> pcib5:  at device 0.3 on pci1
> >> pci5:  on pcib5
> >> 3ware device driver for 9000 series storage controllers, version: 
> >3.60.02.012
> >> twa0: <3ware 9000 series Storage Controller> port 0x3000-0x303f mem 
> >0xd800-0
> >> xd9ff,0xda10-0xda100fff irq 24 at device 1.0 on pci5
> >> twa0: [GIANT-LOCKED]
> >> twa0: INFO: (0x15: 0x1300): Controller details:: Model 9550SX-4LP, 4 
> >ports, Firm
> >> ware FE9X 3.04.01.011, BIOS BE9X 3.04.00.002
> >> pci0:  at device 8.0 (no driver attached)
> >> pcib6:  irq 17 at device 28.0 on pci0
> >> pci6:  on pcib6
> >> uhci0:  port 0x1800-0x181f irq 17 at 
> >device 29.0
> >> on pci0
> >> uhci0: [GIANT-LOCKED]
> >> usb0:  on uhci0
> >> usb0: USB revision 1.0
> >> uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
> >> uhub0: 2 ports with 2 removable, self powered
> >> uhci1:  port 0x1820-0x183f irq 19 at 
> >device 29.1
> >> on pci0
> >> uhci1: [GIANT-LOCKED]
> >> usb1:  on uhci1
> >> usb1: USB revision 1.0
> >> uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
> >> uhub1: 2 ports with 2 removable, self powered
> >> uhci2:  port 0x1840-0x185f irq 18 at 
> >device 29.2
> >> on pci0
> >> uhci2: [GIANT-LOCKED]
> >> usb2:  on uhci2
> >> usb2: USB revision 1.0
> >> uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
> >> uhub2: 2 ports with 2 removable, self powered
> >> uhci3:  port 0x1860-0x187f irq 16 at 
> >device 29.3
> >> on pci0
> >> uhci3: [GIANT-LOCKED]
> >> usb3:  on uhci3
> >> usb3: USB revision 1.0
> >> uhub3: Intel UHCI r

Re: PATCH : ARP problem with 6.2-STABLE Intel PRO/1000 NIC, latest em driver

2007-03-05 Thread Mark Costlow
On Mon, Mar 05, 2007 at 02:13:36PM -0800, Jack Vogel wrote:
[...snip...]
> >>
> >> Don't bother installing CURRENT, just got out of my meeting and I found
> >> out what the problem is. There is indeed an issue with management, and
> >> its something our test group isnt set up to test. I will send a patch to
> >> try sometime before end of day.
> 
> OK, here is the patch, this should fix it...

Hi Jack, the patch didn't seem to have any effect.  When I run "tcpdump -n arp"
after rebooting with this patch, I still see 2-3 ARPs per minute instead of
100-200 per minute.

I was patching against:
/*$FreeBSD: src/sys/dev/em/if_em.c,v 1.65.2.22 2007/03/01 17:32:27 csjp Exp $*/

Is that correct?

I tried both SMP and non-SMP kernels, with same results.  Is there
anything I can do to gather some additional debug information from the
system while it's running?

I neglected to mention before the specific motherboard model:
Supermicro X7DVL-E.  There is no IPMI card installed, and no
IPMI setting in the BIOS.

Thanks,

Mark
-- 
Mark Costlow| Southwest Cyberport | Fax:   +1-505-232-7975
[EMAIL PROTECTED] | Web:   www.swcp.com | Voice: +1-505-232-7992

abq-strange.com -- Interesting photos taken in Albuquerque, NM
   Last post: Art Is OK...And Dangerous - 2007-03-02 10:27:17

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: ARP problem with 6.2-STABLE Intel PRO/1000 NIC, latest em

2007-03-06 Thread Mark Costlow
On Tue, Mar 06, 2007 at 10:24:46AM +, Chris Rees wrote:
> 
> If your NIC is knackered, where are you from? I can post you one I'm
> not using, instead of you buying one. It's a Realtek PCI 8139 10/100
> Mb/s. Let me know if you're interested.

Thank you very much for the offer!  However, I have tried another NIC
in the machine (a Realtek USB adaptor) and it worked normally.  At
that point I would suspect the hardware except that when this machine
had linux loaded on it, it worked normally.   The box is in a 1U
case with no spare PCI slots, so I need the motherboard NIC to work
for it to be useful long-term.

Thanks,

Mark
-- 
Mark Costlow| Southwest Cyberport | Fax:   +1-505-232-7975
[EMAIL PROTECTED] | Web:   www.swcp.com | Voice: +1-505-232-7992

abq-strange.com -- Interesting photos taken in Albuquerque, NM
   Last post: Art Is OK...And Dangerous - 2007-03-02 10:27:17

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: File system failure! URGENT Help needed!

2002-06-23 Thread Mark Costlow

On Sun, Jun 23, 2002 at 04:03:15PM +0600, [EMAIL PROTECTED] wrote:
> HL> If you have a spare drive that is exactly the same type you could make a
> HL> binary copy of the disk to it, and don't need to worry about making the
> HL> prolem worse while recovering the data with inode magic.
>
> Yes, of course, I did it. Thanks!

Now that you have a copy of the data, this tool can help you extract your
data from the damaged file system:

http://www.porcupine.org/forensics/tct.html

Be sure to read the instructions carefully before starting to use it.  It
is a slow process, but it works.

Mark
-- 
Mark Costlow| Southwest Cyberport | Fax:   +1-505-232-7975
[EMAIL PROTECTED] | Web:   www.swcp.com | Voice: +1-505-232-7992

  "Education is never a waste" - Viscount du Valmont

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message