On May 27, 2013, at 12:59 AM, Daniel Braniss wrote: On Fri, May 24, 2013 at 05:31:13PM +0300, Daniel Braniss wrote: hi, after upgrading to 9.1-stable, this particular hardware - SunFire X2200,
If you're truly running stable/9, and it's up-to-date, you should have have already SVN revisions 248858 and 250650. Both of which have significant impact for (a) the SunFire X2200 (r248858) and (b) the DOWN/UP problem (r250650). Show me dmesg(bge(4) and brgphy(4) only) and 'ifconfig bge1' output. bge0: <Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x009003> mem 0xfdff0000-0xfdffffff,0xfdfe0000-0xfdfeffff irq 17 at device 4.0 on pci6 bge0: CHIP ID 0x00009003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 MHz miibus2: <MII bus> on bge0 brgphy0: <BCM5714 1000BASE-T media interface> PHY 1 on miibus2 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow bge0: Ethernet address: 00:1b:24:5d:5b:bd bge1: <Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x009003> mem 0xfdfc0000-0xfdfcffff,0xfdfb0000-0xfdfbffff irq 18 at device 4.1 on pci6 bge1: CHIP ID 0x00009003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 MHz miibus3: <MII bus> on bge1 brgphy1: <BCM5714 1000BASE-T media interface> PHY 1 on miibus3 brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow bge1: Ethernet address: 00:1b:24:5d:5b:be sf-10> ifconfig bge1 bge1: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=8009b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTA TE> ether 00:1b:24:5d:5b:be nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> media: Ethernet autoselect (100baseTX <full-duplex>) status: active Saw similar things happening over here with different broadcom chipset, and the above revisions helped significantly (URLs below): http://svnweb.freebsd.org/base?view=revision&revision=248858 http://svnweb.freebsd.org/base?view=revision&revision=250650 is toggeling bge1 DOWN/UP every few hours, this port is being used by the ILO. To check, I upgraded another identical host, and the same problem appears. What is the last known working revision? I have no idea, but I have older versions, and ill start from the oldets (9.1-prerelease), but it will take time, since it takes hours till it happens. There are ways you can speed up the replication time. I tend to flood a server with TCP while I've heard of it happening under UDP flood too. Here's a nice way to flood a server with TCP (assuming you have SSH access to the system via keys): sh -c 'while :;do dd if=/dev/urandom of=/dev/stdout bs=1m count=1024 | ssh HOST2KILL /sbin/md5; done' Run that about 16 times in separate screen sessions from various other hosts on your network, taking care to replace "HOST2KILL" with the hostname or IP of the box with the SunFire X2200. Let that run for a while, and then when you think you've had a reset (if you weren't standing there watching for one)… grep 'bge.*DOWN' /var/log/messages On a system that has booted and stayed up-and-running, there shouldn't be any messages like this: bge0: link state changed to DOWN When you actually get this message (if your experience is like ours), you'll be down for 90 seconds while the NIC resets. However, since you say you have some older 9.1 releases… I'd start by first trying to bring the replication time of the problem down by using TCP and/or UDP floods. That way you'll be able to test for resolution of the problem as you progress up to stable/9 (where the problem should be fixed by the aforementioned SVN revisions -- specific to your hardware). There is not correlation with time, since they happend at totaly different times. I rebooted both hosts at almost the same time. one host : uptime: 5:24PM up 6:15, 0 users, load averages: 0.00, 0.00, 0.00 May 24 12:53:52 sf-04 kernel: bge1: link state changed to DOWN May 24 12:53:55 sf-04 kernel: bge1: link state changed to UP May 24 15:34:25 sf-04 kernel: bge1: link state changed to DOWN May 24 15:34:28 sf-04 kernel: bge1: link state changed to UP and uptime: 5:24PM up 6:14, 0 users, load averages: 0.00, 0.00, 0.00 May 24 16:30:44 sf-10 kernel: bge1: link state changed to DOWN May 24 16:30:44 sf-10 kernel: bge1: link state changed to UP this is not serious, the ilo (ssh) connection is ok, but it's anoying, we have more than 10 of this hosts, and if I upgrade all of them, the logs will fill up with this :-) any ideas? Well, you say the connection is OK… so it doesn't sound like a full reset as it was in our case (we have a different chipset). But I agree that a log full of those would be annoying. Try getting up to stable/9 in its current state (note: stable/8 also has all the aforementioned revisions too). -- Devin _____________ The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you. _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"