On Tue, Nov 10, 2015 at 08:27:36PM +1000, David Gwynne wrote:
> any joy? i mean, failure?
Last night my script triggered three times, hooray ;)
unfortunately my eyes do not even notice much of a difference outside of
system load values in the systat output :(
gem0: flags=8867<UP,BROADCAST,DEBUG,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST> mtu
1500
lladdr 00:03:ba:2b:47:70
priority: 0
groups: egress
media: Ethernet autoselect (100baseTX full-duplex)
status: active
inet 96.54.13.103 netmask 0xfffffc00 broadcast 96.54.15.255
gem0: flags=8867<UP,BROADCAST,DEBUG,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST> mtu
1500
lladdr 00:03:ba:2b:47:70
priority: 0
groups: egress
media: Ethernet autoselect (100baseTX full-duplex)
status: active
inet 96.54.13.103 netmask 0xfffffc00 broadcast 96.54.15.255
gem0: flags=8867<UP,BROADCAST,DEBUG,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST> mtu
1500
lladdr 00:03:ba:2b:47:70
priority: 0
groups: egress
media: Ethernet autoselect (100baseTX full-duplex)
status: active
inet 96.54.13.103 netmask 0xfffffc00 broadcast 96.54.15.255
#
8 users Load 0.69 0.43 0.29 Mon Nov 9 20:31:11 2015
IFACE LIVELOCKS SIZE ALIVE LWM HWM CWM
System 0 256 56 129
2048 32 1025
lo0
gem0 2048 18 4 124 18
gem1 2048 12 4 124 12
enc0
vlan100
vlan101
vlan102
vlan2
tun0
gif0
pflow0
pflog0
8 users Load 0.44 0.39 0.29 Mon Nov 9 20:32:11 2015
IFACE LIVELOCKS SIZE ALIVE LWM HWM CWM
System 0 256 52 129
2048 25 1025
lo0
gem0 2048 11 4 124 11
gem1 2048 12 4 124 12
enc0
vlan100
vlan101
vlan102
vlan2
tun0
gif0
pflow0
pflog0
8 users Load 0.11 0.18 0.16 Mon Nov 9 21:54:11 2015
IFACE LIVELOCKS SIZE ALIVE LWM HWM CWM
System 0 256 55 129
2048 28 1025
lo0
gem0 2048 18 4 124 18
gem1 2048 10 4 124 10
enc0
vlan100
vlan101
vlan102
vlan2
tun0
gif0
pflow0
pflog0
>
> > On 9 Nov 2015, at 10:40 AM, Ryan Freeman <[email protected]> wrote:
> >
> > On Mon, Nov 09, 2015 at 10:07:31AM +1000, David Gwynne wrote:
> >> can you get the ifconfig output when its locked up? and a copy of what
> >> systat mb is showing?
> >>
> >> cheers,
> >> dlg
> >
> > Thanks David,
> >
> > I have setup a script to try and capture this immediately when it happens.
> >
> > FWIW here is the output as it is now, working:
> >
> > 16:35 ryan@void:~$ ifconfig
> > lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 32768
> > priority: 0
> > groups: lo
> > inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4
> > inet6 ::1 prefixlen 128
> > inet 127.0.0.1 netmask 0xff000000
> > gem0: flags=8867<UP,BROADCAST,DEBUG,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST>
> > mtu 1500
> > lladdr 00:03:ba:2b:47:70
> > priority: 0
> > groups: egress
> > media: Ethernet autoselect (100baseTX full-duplex)
> > status: active
> > inet 96.54.13.103 netmask 0xfffffc00 broadcast 96.54.15.255
> > gem1:
> > flags=8b63<UP,BROADCAST,NOTRAILERS,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST>
> > mtu 1500
> > lladdr 00:03:ba:2b:47:71
> > priority: 0
> > media: Ethernet autoselect (100baseTX full-duplex)
> > status: active
> > inet 10.16.1.30 netmask 0xffffffe0 broadcast 10.16.1.31
> > inet6 fe80::203:baff:fe2b:4771%gem1 prefixlen 64 scopeid 0x2
> > inet6 2001:470:b:6cf::1 prefixlen 64
> > enc0: flags=0<>
> > priority: 0
> > groups: enc
> > status: active
> > vlan100: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
> > lladdr 00:03:ba:2b:47:71
> > description: servers
> > priority: 0
> > vlan: 100 parent interface: gem1
> > groups: vlan
> > status: active
> > inet 10.21.1.30 netmask 0xffffffe0 broadcast 10.21.1.31
> > inet6 fe80::203:baff:fe2b:4771%vlan100 prefixlen 64 scopeid 0x5
> > inet6 2001:470:eac8:666::1 prefixlen 64
> > vlan101: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
> > lladdr 00:03:ba:2b:47:71
> > description: workstations
> > priority: 0
> > vlan: 101 parent interface: gem1
> > groups: vlan
> > status: active
> > inet 10.21.8.254 netmask 0xffffff80 broadcast 10.21.8.255
> > inet6 fe80::203:baff:fe2b:4771%vlan101 prefixlen 64 scopeid 0x6
> > inet6 2001:470:eac8:a::1 prefixlen 64
> > vlan102: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
> > lladdr 00:03:ba:2b:47:71
> > description: wireless
> > priority: 0
> > vlan: 102 parent interface: gem1
> > groups: vlan
> > status: active
> > inet 10.21.9.254 netmask 0xffffff80 broadcast 10.21.9.255
> > inet6 fe80::203:baff:fe2b:4771%vlan102 prefixlen 64 scopeid 0x7
> > inet6 2001:470:eac8:b::1 prefixlen 64
> > vlan2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
> > lladdr 00:03:ba:2b:47:71
> > description: transit
> > priority: 0
> > vlan: 2 parent interface: gem1
> > groups: vlan
> > status: active
> > inet 172.21.1.2 netmask 0xfffffffc broadcast 172.21.1.3
> > tun0: flags=51<UP,POINTOPOINT,RUNNING> mtu 1500
> > priority: 0
> > groups: tun
> > status: down
> > inet 10.21.2.1 --> 10.21.2.2 netmask 0xfffffffc
> > gif0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 1280
> > priority: 0
> > groups: gif egress
> > tunnel: inet 96.54.13.103 -> 216.218.226.238
> > inet6 fe80::203:baff:fe2b:4770%gif0 -> prefixlen 64 scopeid 0xa
> > inet6 2001:470:a:6cf::2 -> 2001:470:a:6cf::1 prefixlen 128
> > pflow0: flags=41<UP,RUNNING> mtu 1492
> > priority: 0
> > pflow: sender: 127.0.0.1 receiver: 127.0.0.1:9995 version: 5
> > groups: pflow
> > pflog0: flags=141<UP,RUNNING,PROMISC> mtu 33144
> > priority: 0
> > groups: pflog
> >
> > 16:36 ryan@void:~$ systat -b mb
> > 8 users Load 0.21 0.25 0.26 Sun Nov 8 16:37:12
> > 2015
> >
> > IFACE LIVELOCKS SIZE ALIVE LWM HWM CWM
> >
> > System 0 256 48 129
> >
> > 2048 24 1025
> >
> > lo0
> >
> > gem0 2048 11 4 124 11
> >
> > gem1 2048 12 4 124 12
> >
> > enc0
> >
> > vlan100
> >
> > vlan101
> >
> > vlan102
> >
> > vlan2
> >
> > tun0
> >
> > gif0
> >
> > pflow0
> >
> > pflog0
> >
> >>
> >>> On 9 Nov 2015, at 09:36, Ryan Freeman <[email protected]> wrote:
> >>>
> >>> Hey tech@,
> >>>
> >>> At my wits end here, I recently got a sunfire v120 from work for pretty
> >>> cheap.
> >>> Quite excited to have some non x86 hardware, I set it up as a router.
> >>>
> >>> However, for some reason after sometimes mere hours -- othertimes days at
> >>> a
> >>> time, the gem0 interface needs to be cycled:
> >>>
> >>> ifconfig gem0 down
> >>> ifconfig gem0 up
> >>> dhclient gem0
> >>>
> >>> no packets pass until that has been done. At first I have been placing
> >>> the
> >>> blame squarely on the Hitron modem we have in the house from shaw cable,
> >>> but now I've noticed the issue happen twice on the internal interface as
> >>> well,
> >>> gem1. All VLANs I have setup stop responding until gem1 is cycled.
> >>>
> >>> gem1 is just used by a collection of vlan(4) interfaces, so traffic
> >>> resumes
> >>> immediately after interface gem1 down/up.
> >>>
> >>> I've tried to turn on ifconfig gem0 debug to catch anything wierd, but
> >>> there
> >>> has been nothing of interest there. Dmesg attached, starting to wonder
> >>> if this machine is at its EOL and the network ports are dying :(
> >>>
> >>> This issue occurred with the 5.7 release as well.
> >>>
> >>> dmesg:
> >>> console is /pci@1f,0/pci@1,1/isa@7/serial@0,3f8
> >>> Copyright (c) 1982, 1986, 1989, 1991, 1993
> >>> The Regents of the University of California. All rights reserved.
> >>> Copyright (c) 1995-2015 OpenBSD. All rights reserved.
> >>> http://www.OpenBSD.org
> >>>
> >>> OpenBSD 5.8 (GENERIC) #0: Thu Oct 22 00:24:09 PDT 2015
> >>> [email protected]:/usr/src/sys/arch/sparc64/compile/GENERIC
> >>> real mem = 1073741824 (1024MB)
> >>> avail mem = 1039228928 (991MB)
> >>> mpath0 at root
> >>> scsibus0 at mpath0: 256 targets
> >>> mainbus0 at root: Sun Fire V120 (UltraSPARC-IIe 648MHz)
> >>> cpu0 at mainbus0: SUNW,UltraSPARC-IIe (rev 3.3) @ 648 MHz
> >>> cpu0: physical 16K instruction (32 b/l), 16K data (32 b/l), 512K external
> >>> (64 b/l)
> >>> psycho0 at mainbus0: SUNW,sabre, impl 0, version 0, ign 7c0
> >>> psycho0: bus range 0-2, PCI bus 0
> >>> psycho0: dvma map c0000000-dfffffff
> >>> pci0 at psycho0
> >>> ppb0 at pci0 dev 1 function 1 "Sun Simba" rev 0x13
> >>> pci1 at ppb0 bus 1
> >>> ebus0 at pci1 dev 12 function 0 "Sun RIO EBus" rev 0x01
> >>> "flashprom" at ebus0 addr 0-fffff not configured
> >>> clock1 at ebus0 addr 0-1fff: mk48t59
> >>> lom0 at ebus0 addr 200000-200003 ivec 0x2a: LOMlite2 rev 3.12
> >>> alipm0 at pci1 dev 3 function 0 "Acer Labs M7101 Power" rev 0x00: 74KHz
> >>> clock
> >>> iic0 at alipm0
> >>> "max1617" at alipm0 addr 0x18 skipped due to alipm0 bugs
> >>> spdmem0 at iic0 addr 0x54: 512MB SDRAM registered ECC PC133CL2
> >>> spdmem1 at iic0 addr 0x55: 512MB SDRAM registered ECC PC133CL2
> >>> ebus1 at pci1 dev 7 function 0 "Acer Labs M1533 ISA" rev 0x00
> >>> power0 at ebus1 addr 2000-2007 ivec 0x25
> >>> com0 at ebus1 addr 3f8-3ff ivec 0x2b: ns16550a, 16 byte fifo
> >>> com0: console
> >>> com1 at ebus1 addr 2e8-2ef ivec 0x2b: ns16550a, 16 byte fifo
> >>> gem0 at pci1 dev 12 function 1 "Sun ERI Ether" rev 0x01: ivec 0x7c6,
> >>> address 00:03:ba:2b:47:70
> >>> ukphy0 at gem0 phy 1: Generic IEEE 802.3u media interface, rev. 1: OUI
> >>> 0x0010dd, model 0x0002
> >>> ohci0 at pci1 dev 12 function 3 "Sun USB" rev 0x01: ivec 0x7e4, version
> >>> 1.0, legacy support
> >>> pciide0 at pci1 dev 13 function 0 "Acer Labs M5229 UDMA IDE" rev 0xc3:
> >>> DMA, channel 0 configured to native-PCI, channel 1 configured to
> >>> native-PCI
> >>> pciide0: using ivec 0x7cc for native-PCI interrupt
> >>> atapiscsi0 at pciide0 channel 0 drive 0
> >>> scsibus1 at atapiscsi0: 2 targets
> >>> cd0 at scsibus1 targ 0 lun 0: <TEAC, CD-224E, P.9A> ATAPI 5/cdrom
> >>> removable
> >>> cd0(pciide0:0:0): using PIO mode 4, DMA mode 2
> >>> pciide0: channel 1 disabled (no drives)
> >>> gem1 at pci1 dev 5 function 1 "Sun ERI Ether" rev 0x01: ivec 0x7dc,
> >>> address 00:03:ba:2b:47:71
> >>> ukphy1 at gem1 phy 1: Generic IEEE 802.3u media interface, rev. 1: OUI
> >>> 0x0010dd, model 0x0002
> >>> ohci1 at pci1 dev 5 function 3 "Sun USB" rev 0x01: ivec 0x7e6, version
> >>> 1.0, legacy support
> >>> usb0 at ohci0: USB revision 1.0
> >>> uhub0 at usb0 "Sun OHCI root hub" rev 1.00/1.00 addr 1
> >>> usb1 at ohci1: USB revision 1.0
> >>> uhub1 at usb1 "Sun OHCI root hub" rev 1.00/1.00 addr 1
> >>> ppb1 at pci0 dev 1 function 0 "Sun Simba" rev 0x13
> >>> pci2 at ppb1 bus 2
> >>> siop0 at pci2 dev 8 function 0 "Symbios Logic 53c896" rev 0x07: ivec
> >>> 0x7e0, using 8K of on-board RAM
> >>> scsibus2 at siop0: 16 targets, initiator 7
> >>> sym0 at scsibus2 targ 0 lun 0: <SEAGATE, ST336607LSUN36G, 0207> SCSI3
> >>> 0/direct fixed serial.SEAGATE_ST336607LSUN36G_3JA0DGN80000731804D9
> >>> sd0 at scsibus0 targ 0 lun 0: <SEAGATE, ST336607LSUN36G, 0207> SCSI3
> >>> 0/direct fixed serial.SEAGATE_ST336607LSUN36G_3JA0DGN80000731804D9
> >>> sd0: 34732MB, 512 bytes/sector, 71132959 sectors
> >>> probe(siop0:1:0): Check Condition (error 0x70) on opcode 0x0
> >>> SENSE KEY: Hardware Error
> >>> ASC/ASCQ: Defect List Error
> >>> FRU CODE: 0x7
> >>> sym1 at scsibus2 targ 1 lun 0: <SEAGATE, ST336607LSUN36G, 0207> SCSI3
> >>> 0/direct fixed serial.SEAGATE_ST336607LSUN36G_3JA0BZL100002316NCUL
> >>> sd1 at scsibus0 targ 1 lun 0: <SEAGATE, ST336607LSUN36G, 0207> SCSI3
> >>> 0/direct fixed serial.SEAGATE_ST336607LSUN36G_3JA0BZL100002316NCUL
> >>> siop1 at pci2 dev 8 function 1 "Symbios Logic 53c896" rev 0x07: ivec
> >>> 0x7e0, using 8K of on-board RAM
> >>> scsibus3 at siop1: 16 targets, initiator 7
> >>> siop0: target 0 now using tagged 16 bit 40.0 MHz 31 REQ/ACK offset xfers
> >>> vscsi0 at root
> >>> scsibus4 at vscsi0: 256 targets
> >>> softraid0 at root
> >>> scsibus5 at softraid0: 256 targets
> >>> siop0: target 1 now using tagged 16 bit 40.0 MHz 31 REQ/ACK offset xfers
> >>> bootpath: /pci@1f,0/pci@1,0/scsi@8,0/disk@0,0
> >>> root on sd0a (dd2dc38974492ea6.a) swap on sd0b dump on sd0b
> >>>
> >>
>