On 2014-09-06 23:13:51, Bryan Linton <b...@shoshoni.info> wrote: > Hello list, > > [dmesg and ifconfig output attached inline at bottom] > > I upgraded from a mid-July snapshot to a recent one, and I've been > experiencing a strange problem with the network losing > connectivity that I haven't been able to pin down. > > What happens is the machine (a Thinkpad T60) will lose all network > connectivity, even to the point where pinging local machines or > the default gateway will produce no response. > > A simple "ifconfig em0 down up" will restore connectivity. There > are no errors in the dmesg, and "ifconfig em0" shows the interface > as up and running when connectivity is lost. > > Fetching email with ports/mail/fdm (I haven't tried any other > MDAs) will usually cause the hang to occur. Strangely, regular > web browsing doesn't seem to be very likely to cause this. > > Does anyone have any idea of things I can do to try to isolate > this? I've looked through the CVS logs since mid-July and nothing > jumped out at me as being particularly likely to cause this. > > Thank you. > > -- > Bryan > > > OpenBSD 5.6-current (GENERIC.MP) #330: Thu Sep 4 02:53:34 MDT 2014 > t...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC.MP > cpu0: Genuine Intel(R) CPU T2300 @ 1.66GHz ("GenuineIntel" 686-class) 1.67 GHz > cpu0: > FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,NXE,SSE3,MWAIT,VMX,EST,TM2,xTPR,PDCM,PERF > real mem = 2682613760 (2558MB) > avail mem = 2626367488 (2504MB) > mpath0 at root > scsibus0 at mpath0: 256 targets > mainbus0 at root > bios0 at mainbus0: AT/286+ BIOS, date 04/01/10, BIOS32 rev. 0 @ 0xfd6b0, > SMBIOS rev. 2.4 @ 0xe0010 (68 entries) > bios0: vendor LENOVO version "79ETE6WW (2.26 )" date 04/01/2010 > bios0: LENOVO 2623D9U > acpi0 at bios0: rev 2 > acpi0: sleep states S0 S3 S4 S5 > acpi0: tables DSDT FACP SSDT ECDT TCPA APIC MCFG HPET BOOT SSDT SSDT SSDT SSDT > acpi0: wakeup devices LID_(S3) SLPB(S3) LURT(S3) DURT(S3) EXP0(S4) EXP1(S4) > EXP2(S4) EXP3(S4) PCI1(S4) USB0(S3) USB1(S3) USB2(S3) USB7(S3) HDEF(S4) > acpitimer0 at acpi0: 3579545 Hz, 24 bits > acpiec0 at acpi0 > acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat > cpu0 at mainbus0: apid 0 (boot processor) > mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges > cpu0: apic clock running at 166MHz > cpu0: mwait min=64, max=64, C-substates=0.2.2.2.2, IBE > cpu1 at mainbus0: apid 1 (application processor) > cpu1: Genuine Intel(R) CPU T2300 @ 1.66GHz ("GenuineIntel" 686-class) 1.67 GHz > cpu1: > FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,NXE,SSE3,MWAIT,VMX,EST,TM2,xTPR,PDCM,PERF > ioapic0 at mainbus0: apid 1 pa 0xfec00000, version 20, 24 pins > ioapic0: misconfigured as apic 2, remapped to apid 1 > acpimcfg0 at acpi0 addr 0xf0000000, bus 0-63 > acpihpet0 at acpi0: 14318179 Hz > acpiprt0 at acpi0: bus 0 (PCI0) > acpiprt1 at acpi0: bus 1 (AGP_) > acpiprt2 at acpi0: bus 2 (EXP0) > acpiprt3 at acpi0: bus 3 (EXP1) > acpiprt4 at acpi0: bus 4 (EXP2) > acpiprt5 at acpi0: bus 12 (EXP3) > acpiprt6 at acpi0: bus 21 (PCI1) > acpicpu0 at acpi0: C3, C2, C1, PSS > acpicpu1 at acpi0: C3, C2, C1, PSS > acpipwrres0 at acpi0: PUBS, resource for USB0, USB2, USB7 > acpitz0 at acpi0: critical temperature is 127 degC > acpitz1 at acpi0: critical temperature is 99 degC > acpibtn0 at acpi0: LID_ > acpibtn1 at acpi0: SLPB > acpibat0 at acpi0: BAT0 model "92P1139" serial 659 type LION oem "Panasonic" > acpibat1 at acpi0: BAT1 not present > acpiac0 at acpi0: AC unit online > acpithinkpad0 at acpi0 > acpidock0 at acpi0: GDCK docked (15) > bios0: ROM list: 0xc0000/0xfe00 0xd0000/0x1000 0xd1000/0x1000 0xdc000/0x4000! > 0xe0000/0x10000! > cpu0: Enhanced SpeedStep 1663 MHz: speeds: 1667, 1333, 1000 MHz > pci0 at mainbus0 bus 0: configuration mode 1 (bios) > pchb0 at pci0 dev 0 function 0 "Intel 82945GM Host" rev 0x03 > ppb0 at pci0 dev 1 function 0 "Intel 82945GM PCIE" rev 0x03: apic 1 int 16 > pci1 at ppb0 bus 1 > radeondrm0 at pci1 dev 0 function 0 "ATI Radeon Mobility X1300 M52-64" rev > 0x00 > drm0 at radeondrm0 > radeondrm0: apic 1 int 16 > azalia0 at pci0 dev 27 function 0 "Intel 82801GB HD Audio" rev 0x02: msi > azalia0: codecs: Analog Devices AD1981HD, Conexant/0x2bfa, using Analog > Devices AD1981HD > audio0 at azalia0 > ppb1 at pci0 dev 28 function 0 "Intel 82801GB PCIE" rev 0x02: apic 1 int 20 > pci2 at ppb1 bus 2 > em0 at pci2 dev 0 function 0 "Intel 82573L" rev 0x00: msi, address > 00:16:41:52:7e:81 > ppb2 at pci0 dev 28 function 1 "Intel 82801GB PCIE" rev 0x02: apic 1 int 21 > pci3 at ppb2 bus 3 > wpi0 at pci3 dev 0 function 0 "Intel PRO/Wireless 3945ABG" rev 0x02: msi, > MoW1, address 00:13:02:20:41:18 > ppb3 at pci0 dev 28 function 2 "Intel 82801GB PCIE" rev 0x02: apic 1 int 22 > pci4 at ppb3 bus 4 > ppb4 at pci0 dev 28 function 3 "Intel 82801GB PCIE" rev 0x02: apic 1 int 23 > pci5 at ppb4 bus 12 > uhci0 at pci0 dev 29 function 0 "Intel 82801GB USB" rev 0x02: apic 1 int 16 > uhci1 at pci0 dev 29 function 1 "Intel 82801GB USB" rev 0x02: apic 1 int 17 > uhci2 at pci0 dev 29 function 2 "Intel 82801GB USB" rev 0x02: apic 1 int 18 > uhci3 at pci0 dev 29 function 3 "Intel 82801GB USB" rev 0x02: apic 1 int 19 > ehci0 at pci0 dev 29 function 7 "Intel 82801GB USB" rev 0x02: apic 1 int 19 > usb0 at ehci0: USB revision 2.0 > uhub0 at usb0 "Intel EHCI root hub" rev 2.00/1.00 addr 1 > ppb5 at pci0 dev 30 function 0 "Intel 82801BAM Hub-to-PCI" rev 0xe2 > pci6 at ppb5 bus 21 > cbb0 at pci6 dev 0 function 0 "TI PCI1510 CardBus" rev 0x00: apic 1 int 16 > cardslot0 at cbb0 slot 0 flags 0 > cardbus0 at cardslot0: bus 22 device 0 cacheline 0x8, lattimer 0xb0 > pcmcia0 at cardslot0 > ichpcib0 at pci0 dev 31 function 0 "Intel 82801GBM LPC" rev 0x02: PM disabled > pciide0 at pci0 dev 31 function 1 "Intel 82801GB IDE" rev 0x02: DMA, channel > 0 configured to compatibility, channel 1 configured to compatibility > atapiscsi0 at pciide0 channel 0 drive 0 > scsibus1 at atapiscsi0: 2 targets > cd0 at scsibus1 targ 0 lun 0: <HL-DT-ST, DVDRAM GSA-U10N, 1.05> ATAPI 5/cdrom > removable > cd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2 > pciide0: channel 1 ignored (disabled) > ahci0 at pci0 dev 31 function 2 "Intel 82801GBM AHCI" rev 0x02: msi, AHCI 1.1 > scsibus2 at ahci0: 32 targets > sd0 at scsibus2 targ 0 lun 0: <ATA, INTEL SSDSC2CW24, 400i> SCSI3 0/direct > fixed naa.5001517bb2a98d08 > sd0: 228936MB, 512 bytes/sector, 468862128 sectors, thin > ichiic0 at pci0 dev 31 function 3 "Intel 82801GB SMBus" rev 0x02: apic 1 int > 23 > iic0 at ichiic0 > usb1 at uhci0: USB revision 1.0 > uhub1 at usb1 "Intel UHCI root hub" rev 1.00/1.00 addr 1 > usb2 at uhci1: USB revision 1.0 > uhub2 at usb2 "Intel UHCI root hub" rev 1.00/1.00 addr 1 > usb3 at uhci2: USB revision 1.0 > uhub3 at usb3 "Intel UHCI root hub" rev 1.00/1.00 addr 1 > usb4 at uhci3: USB revision 1.0 > uhub4 at usb4 "Intel UHCI root hub" rev 1.00/1.00 addr 1 > isa0 at ichpcib0 > isadma0 at isa0 > com1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo > pckbc0 at isa0 port 0x60/5 > pckbd0 at pckbc0 (kbd slot) > pckbc0: using irq 1 for kbd slot > wskbd0 at pckbd0: console keyboard > pms0 at pckbc0 (aux slot) > pckbc0: using irq 12 for aux slot > wsmouse0 at pms0 mux 0 > pcppi0 at isa0 port 0x61 > spkr0 at pcppi0 > aps0 at isa0 port 0x1600/31 > npx0 at isa0 port 0xf0/16: reported by CPUID; using exception 16 > uhub5 at uhub0 port 6 "IBM product 0x4485" rev 2.00/0.01 addr 2 > uhidev0 at uhub5 port 2 configuration 1 interface 0 "Logitech USB-PS/2 > Optical Mouse" rev 2.00/20.00 addr 3 > uhidev0: iclass 3/1 > ums0 at uhidev0: 3 buttons, Z dir > wsmouse1 at ums0 mux 0 > uhidev1 at uhub5 port 3 configuration 1 interface 0 "Gravis GamePad Pro USB" > rev 1.00/2.00 addr 4 > uhidev1: iclass 3/0 > uhid0 at uhidev1: input=4, output=0, feature=0 > ugen0 at uhub4 port 2 "STMicroelectronics Biometric Coprocessor" rev > 1.00/0.01 addr 2 > vscsi0 at root > scsibus3 at vscsi0: 256 targets > softraid0 at root > scsibus4 at softraid0: 256 targets > sd1 at scsibus4 targ 1 lun 0: <OPENBSD, SR CRYPTO, 005> SCSI2 0/direct fixed > sd1: 200595MB, 512 bytes/sector, 410819160 sectors > softraid0: volume sd1 is roaming, it used to be sd2, updating metadata > root on sd1a (bfe3b486511fab55.a) swap on sd1b dump on sd1b > drm: initializing kernel modesetting (RV515 0x1002:0x7149 0x17AA:0x2005). > radeondrm0: VRAM: 128M 0x0000000000000000 - 0x0000000007FFFFFF (64M used) > radeondrm0: GTT: 512M 0x0000000008000000 - 0x0000000027FFFFFF > drm: PCIE GART of 512M enabled (table at 0x0000000000040000). > radeondrm0: 1024x768 > wsdisplay0 at radeondrm0 mux 1: console (std, vt100 emulation), using wskbd0 > wsdisplay0: screen 1-5 added (std, vt100 emulation) > wpi0: radio is disabled by hardware switch > wpi0: could not initialize hardware > umass0 at uhub5 port 1 configuration 1 interface 0 "Seagate Backup+ Desk" > rev 2.10/3.42 addr 5 > umass0: using SCSI over Bulk-Only > scsibus5 at umass0: 2 targets, initiator 0 > sd2 at scsibus5 targ 1 lun 0: <Seagate, Backup+ Desk, 0342> SCSI4 0/direct > fixed > sd2: 4769307MB, 4096 bytes/sector, 1220942645 sectors > > > ifconfig em0 output: > > em0: flags=28843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,NOINET6> mtu 1500 > lladdr 00:16:41:52:7e:81 > priority: 0 > groups: egress > media: Ethernet autoselect (100baseTX full-duplex,rxpause,txpause) > status: active > inet pub.lic.add.res netmask 0xffffff00 broadcast pub.lic.brd.cst > inet 192.168.1.7 netmask 0xffffff00 broadcast 192.168.1.255 > inet 10.0.0.14 netmask 0xffffff00 broadcast 10.0.0.255 >
Well, I think I've made at least *some* progess in figuring something out. It seems like it's actually disk activity that's causing the hangs. Fetching email grinds the disk a lot since I'm using bogofilter as my spamfilter. I also noticed that rsyncing to another machine, or the rsync process that periodically runs as part of rsnapshot has hung the network too. This also occurs on a Thinkpad X61 in addition to my T60, which is probably not surprising considering how similar the machines' hardware in question is. I'm really not sure where to look for the bug though. The networking stack? The I/O subsystem? The kernel itself? There obviously must be a commit somewhere between mid-July and August 30th that caused this, but I'm really not sure where to start. I'd rather not start rebuilding the entire system to find out where the bug is, since there have been a lot of changes in some areas that I imagine would make downgrading past a certain point rather painful, but I may have no other choice. If I downgrade the system back to late-July and start to bring it more up to date from there, will there be any difficulties eventually bringing it back to -current provided I run sysmerge and do what current.html says again? I would imagine not, but this would be the first time I've rolled back the entire system and not just rebuilt a kernel with a patch reverted, so any assurances or tips would be appreciated. Thank you. -- Bryan