[releng_8 tinderbox] failure on i386/pc98

2010-08-24 Thread FreeBSD Tinderbox
TB --- 2010-08-24 06:30:41 - tinderbox 2.6 running on freebsd-current.sentex.ca
TB --- 2010-08-24 06:30:41 - starting RELENG_8 tinderbox run for i386/pc98
TB --- 2010-08-24 06:30:41 - cleaning the object tree
TB --- 2010-08-24 06:31:20 - cvsupping the source tree
TB --- 2010-08-24 06:31:20 - /usr/bin/csup -z -r 3 -g -L 1 -h cvsup.sentex.ca 
/tinderbox/RELENG_8/i386/pc98/supfile
TB --- 2010-08-24 06:32:47 - building world
TB --- 2010-08-24 06:32:47 - MAKEOBJDIRPREFIX=/obj
TB --- 2010-08-24 06:32:47 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
TB --- 2010-08-24 06:32:47 - TARGET=pc98
TB --- 2010-08-24 06:32:47 - TARGET_ARCH=i386
TB --- 2010-08-24 06:32:47 - TZ=UTC
TB --- 2010-08-24 06:32:47 - __MAKE_CONF=/dev/null
TB --- 2010-08-24 06:32:47 - cd /src
TB --- 2010-08-24 06:32:47 - /usr/bin/make -B buildworld
>>> World build started on Tue Aug 24 06:32:47 UTC 2010
>>> Rebuilding the temporary build tree
>>> stage 1.1: legacy release compatibility shims
>>> stage 1.2: bootstrap tools
>>> stage 2.1: cleaning up the object tree
>>> stage 2.2: rebuilding the object tree
>>> stage 2.3: build tools
>>> stage 3: cross tools
>>> stage 4.1: building includes
>>> stage 4.2: building libraries
>>> stage 4.3: make dependencies
[...]
mkdep -f .depend -a /src/usr.bin/c89/c89.c
echo c89: /obj/pc98/src/tmp/usr/lib/libc.a  >> .depend
===> usr.bin/c99 (depend)
rm -f .depend
mkdep -f .depend -a /src/usr.bin/c99/c99.c
echo c99: /obj/pc98/src/tmp/usr/lib/libc.a  >> .depend
===> usr.bin/calendar (depend)
make: don't know how to make locale.c. Stop
*** Error code 2

Stop in /src/usr.bin.
*** Error code 1

Stop in /src.
*** Error code 1

Stop in /src.
*** Error code 1

Stop in /src.
TB --- 2010-08-24 07:09:24 - WARNING: /usr/bin/make returned exit code  1 
TB --- 2010-08-24 07:09:24 - ERROR: failed to build world
TB --- 2010-08-24 07:09:24 - 1355.84 user 511.73 system 2322.88 real


http://tinderbox.freebsd.org/tinderbox-releng_8-RELENG_8-i386-pc98.full
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


[releng_8 tinderbox] failure on ia64/ia64

2010-08-24 Thread FreeBSD Tinderbox
TB --- 2010-08-24 06:46:26 - tinderbox 2.6 running on freebsd-current.sentex.ca
TB --- 2010-08-24 06:46:26 - starting RELENG_8 tinderbox run for ia64/ia64
TB --- 2010-08-24 06:46:26 - cleaning the object tree
TB --- 2010-08-24 06:47:12 - cvsupping the source tree
TB --- 2010-08-24 06:47:12 - /usr/bin/csup -z -r 3 -g -L 1 -h cvsup.sentex.ca 
/tinderbox/RELENG_8/ia64/ia64/supfile
TB --- 2010-08-24 06:47:43 - building world
TB --- 2010-08-24 06:47:43 - MAKEOBJDIRPREFIX=/obj
TB --- 2010-08-24 06:47:43 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
TB --- 2010-08-24 06:47:43 - TARGET=ia64
TB --- 2010-08-24 06:47:43 - TARGET_ARCH=ia64
TB --- 2010-08-24 06:47:43 - TZ=UTC
TB --- 2010-08-24 06:47:43 - __MAKE_CONF=/dev/null
TB --- 2010-08-24 06:47:43 - cd /src
TB --- 2010-08-24 06:47:43 - /usr/bin/make -B buildworld
>>> World build started on Tue Aug 24 06:47:44 UTC 2010
>>> Rebuilding the temporary build tree
>>> stage 1.1: legacy release compatibility shims
>>> stage 1.2: bootstrap tools
>>> stage 2.1: cleaning up the object tree
>>> stage 2.2: rebuilding the object tree
>>> stage 2.3: build tools
>>> stage 3: cross tools
>>> stage 4.1: building includes
>>> stage 4.2: building libraries
>>> stage 4.3: make dependencies
[...]
mkdep -f .depend -a /src/usr.bin/c89/c89.c
echo c89: /obj/ia64/src/tmp/usr/lib/libc.a  >> .depend
===> usr.bin/c99 (depend)
rm -f .depend
mkdep -f .depend -a /src/usr.bin/c99/c99.c
echo c99: /obj/ia64/src/tmp/usr/lib/libc.a  >> .depend
===> usr.bin/calendar (depend)
make: don't know how to make locale.c. Stop
*** Error code 2

Stop in /src/usr.bin.
*** Error code 1

Stop in /src.
*** Error code 1

Stop in /src.
*** Error code 1

Stop in /src.
TB --- 2010-08-24 07:28:33 - WARNING: /usr/bin/make returned exit code  1 
TB --- 2010-08-24 07:28:33 - ERROR: failed to build world
TB --- 2010-08-24 07:28:33 - 1610.20 user 533.32 system 2527.49 real


http://tinderbox.freebsd.org/tinderbox-releng_8-RELENG_8-ia64-ia64.full
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


[releng_8 tinderbox] failure on mips/mips

2010-08-24 Thread FreeBSD Tinderbox
TB --- 2010-08-24 07:01:13 - tinderbox 2.6 running on freebsd-current.sentex.ca
TB --- 2010-08-24 07:01:13 - starting RELENG_8 tinderbox run for mips/mips
TB --- 2010-08-24 07:01:13 - cleaning the object tree
TB --- 2010-08-24 07:01:39 - cvsupping the source tree
TB --- 2010-08-24 07:01:39 - /usr/bin/csup -z -r 3 -g -L 1 -h cvsup.sentex.ca 
/tinderbox/RELENG_8/mips/mips/supfile
TB --- 2010-08-24 07:02:20 - building world
TB --- 2010-08-24 07:02:20 - MAKEOBJDIRPREFIX=/obj
TB --- 2010-08-24 07:02:20 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
TB --- 2010-08-24 07:02:20 - TARGET=mips
TB --- 2010-08-24 07:02:20 - TARGET_ARCH=mips
TB --- 2010-08-24 07:02:20 - TZ=UTC
TB --- 2010-08-24 07:02:20 - __MAKE_CONF=/dev/null
TB --- 2010-08-24 07:02:20 - cd /src
TB --- 2010-08-24 07:02:20 - /usr/bin/make -B buildworld
>>> World build started on Tue Aug 24 07:02:21 UTC 2010
>>> Rebuilding the temporary build tree
>>> stage 1.1: legacy release compatibility shims
>>> stage 1.2: bootstrap tools
>>> stage 2.1: cleaning up the object tree
>>> stage 2.2: rebuilding the object tree
>>> stage 2.3: build tools
>>> stage 3: cross tools
>>> stage 4.1: building includes
>>> stage 4.2: building libraries
>>> stage 4.3: make dependencies
[...]
mkdep -f .depend -a /src/usr.bin/c89/c89.c
echo c89: /obj/mips/src/tmp/usr/lib/libc.a  >> .depend
===> usr.bin/c99 (depend)
rm -f .depend
mkdep -f .depend -a /src/usr.bin/c99/c99.c
echo c99: /obj/mips/src/tmp/usr/lib/libc.a  >> .depend
===> usr.bin/calendar (depend)
make: don't know how to make locale.c. Stop
*** Error code 2

Stop in /src/usr.bin.
*** Error code 1

Stop in /src.
*** Error code 1

Stop in /src.
*** Error code 1

Stop in /src.
TB --- 2010-08-24 07:34:04 - WARNING: /usr/bin/make returned exit code  1 
TB --- 2010-08-24 07:34:04 - ERROR: failed to build world
TB --- 2010-08-24 07:34:04 - 1150.28 user 488.13 system 1971.91 real


http://tinderbox.freebsd.org/tinderbox-releng_8-RELENG_8-mips-mips.full
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


[releng_8 tinderbox] failure on powerpc/powerpc

2010-08-24 Thread FreeBSD Tinderbox
TB --- 2010-08-24 07:09:24 - tinderbox 2.6 running on freebsd-current.sentex.ca
TB --- 2010-08-24 07:09:24 - starting RELENG_8 tinderbox run for powerpc/powerpc
TB --- 2010-08-24 07:09:24 - cleaning the object tree
TB --- 2010-08-24 07:10:39 - cvsupping the source tree
TB --- 2010-08-24 07:10:39 - /usr/bin/csup -z -r 3 -g -L 1 -h cvsup.sentex.ca 
/tinderbox/RELENG_8/powerpc/powerpc/supfile
TB --- 2010-08-24 07:16:45 - building world
TB --- 2010-08-24 07:16:45 - MAKEOBJDIRPREFIX=/obj
TB --- 2010-08-24 07:16:45 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
TB --- 2010-08-24 07:16:45 - TARGET=powerpc
TB --- 2010-08-24 07:16:45 - TARGET_ARCH=powerpc
TB --- 2010-08-24 07:16:45 - TZ=UTC
TB --- 2010-08-24 07:16:45 - __MAKE_CONF=/dev/null
TB --- 2010-08-24 07:16:45 - cd /src
TB --- 2010-08-24 07:16:45 - /usr/bin/make -B buildworld
>>> World build started on Tue Aug 24 07:16:46 UTC 2010
>>> Rebuilding the temporary build tree
>>> stage 1.1: legacy release compatibility shims
>>> stage 1.2: bootstrap tools
>>> stage 2.1: cleaning up the object tree
>>> stage 2.2: rebuilding the object tree
>>> stage 2.3: build tools
>>> stage 3: cross tools
>>> stage 4.1: building includes
>>> stage 4.2: building libraries
>>> stage 4.3: make dependencies
[...]
mkdep -f .depend -a /src/usr.bin/c89/c89.c
echo c89: /obj/powerpc/src/tmp/usr/lib/libc.a  >> .depend
===> usr.bin/c99 (depend)
rm -f .depend
mkdep -f .depend -a /src/usr.bin/c99/c99.c
echo c99: /obj/powerpc/src/tmp/usr/lib/libc.a  >> .depend
===> usr.bin/calendar (depend)
make: don't know how to make locale.c. Stop
*** Error code 2

Stop in /src/usr.bin.
*** Error code 1

Stop in /src.
*** Error code 1

Stop in /src.
*** Error code 1

Stop in /src.
TB --- 2010-08-24 07:48:20 - WARNING: /usr/bin/make returned exit code  1 
TB --- 2010-08-24 07:48:20 - ERROR: failed to build world
TB --- 2010-08-24 07:48:20 - 1342.73 user 465.74 system 2336.35 real


http://tinderbox.freebsd.org/tinderbox-releng_8-RELENG_8-powerpc-powerpc.full
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


[releng_8 tinderbox] failure on sparc64/sparc64

2010-08-24 Thread FreeBSD Tinderbox
TB --- 2010-08-24 07:18:29 - tinderbox 2.6 running on freebsd-current.sentex.ca
TB --- 2010-08-24 07:18:29 - starting RELENG_8 tinderbox run for sparc64/sparc64
TB --- 2010-08-24 07:18:29 - cleaning the object tree
TB --- 2010-08-24 07:19:19 - cvsupping the source tree
TB --- 2010-08-24 07:19:19 - /usr/bin/csup -z -r 3 -g -L 1 -h cvsup.sentex.ca 
/tinderbox/RELENG_8/sparc64/sparc64/supfile
TB --- 2010-08-24 07:23:45 - building world
TB --- 2010-08-24 07:23:45 - MAKEOBJDIRPREFIX=/obj
TB --- 2010-08-24 07:23:45 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
TB --- 2010-08-24 07:23:45 - TARGET=sparc64
TB --- 2010-08-24 07:23:45 - TARGET_ARCH=sparc64
TB --- 2010-08-24 07:23:45 - TZ=UTC
TB --- 2010-08-24 07:23:45 - __MAKE_CONF=/dev/null
TB --- 2010-08-24 07:23:45 - cd /src
TB --- 2010-08-24 07:23:45 - /usr/bin/make -B buildworld
>>> World build started on Tue Aug 24 07:23:46 UTC 2010
>>> Rebuilding the temporary build tree
>>> stage 1.1: legacy release compatibility shims
>>> stage 1.2: bootstrap tools
>>> stage 2.1: cleaning up the object tree
>>> stage 2.2: rebuilding the object tree
>>> stage 2.3: build tools
>>> stage 3: cross tools
>>> stage 4.1: building includes
>>> stage 4.2: building libraries
>>> stage 4.3: make dependencies
[...]
mkdep -f .depend -a /src/usr.bin/c89/c89.c
echo c89: /obj/sparc64/src/tmp/usr/lib/libc.a  >> .depend
===> usr.bin/c99 (depend)
rm -f .depend
mkdep -f .depend -a /src/usr.bin/c99/c99.c
echo c99: /obj/sparc64/src/tmp/usr/lib/libc.a  >> .depend
===> usr.bin/calendar (depend)
make: don't know how to make locale.c. Stop
*** Error code 2

Stop in /src/usr.bin.
*** Error code 1

Stop in /src.
*** Error code 1

Stop in /src.
*** Error code 1

Stop in /src.
TB --- 2010-08-24 07:52:18 - WARNING: /usr/bin/make returned exit code  1 
TB --- 2010-08-24 07:52:18 - ERROR: failed to build world
TB --- 2010-08-24 07:52:18 - 1259.74 user 406.71 system 2028.26 real


http://tinderbox.freebsd.org/tinderbox-releng_8-RELENG_8-sparc64-sparc64.full
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: kernel MCA messages

2010-08-24 Thread Andriy Gapon
on 24/08/2010 09:14 Ronald Klop said the following:
> 
> A little off topic, but what is 'a low rate of corrected ECC errors'? At work
> one machine has them like ones per day, but runs ok. Is ones per day much?

That's up to your judgment.  It's like after how many remapped sectors do you
replace HDD.
You may find this interesting:
http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: RELENG_8 panic

2010-08-24 Thread Mike Tancsa

At 05:04 PM 8/20/2010, Mike Tancsa wrote:
The box is a moderately busy LNS running mpd5.  I have another box 
running the same load that has not crashed so I am wondering if its 
hardware or this box is just "lucky" ? its crashed a couple of times 
now, but the watchdog rebooted it prior to the dump being written out.


I had disabled the watchdog and was able to get a full crash dump 
this time.  Seems to be about 3 days between crashes



Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x24
fault code  = supervisor read, page not present
instruction pointer = 0x20:0xc5d11e15
stack pointer   = 0x28:0xc4e70704
frame pointer   = 0x28:0xc4e70718
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 0 (em1 taskq)
trap number = 12
panic: page fault
cpuid = 0
Uptime: 3d20h45m41s
Physical memory: 2036 MB
Dumping 228 MB: 212 196 180 164 148 132 116 100 84 68 52 36 20 4

Reading symbols from /boot/kernel/coretemp.ko...Reading symbols from 
/boot/kernel/coretemp.ko.symbols...done.

done.
Loaded symbols for /boot/kernel/coretemp.ko
Reading symbols from /boot/kernel/if_disc.ko...Reading symbols from 
/boot/kernel/if_disc.ko.symbols...done.

done.
Loaded symbols for /boot/kernel/if_disc.ko
Reading symbols from /boot/kernel/ipfw.ko...Reading symbols from 
/boot/kernel/ipfw.ko.symbols...done.

done.
Loaded symbols for /boot/kernel/ipfw.ko
Reading symbols from /boot/kernel/libalias.ko...Reading symbols from 
/boot/kernel/libalias.ko.symbols...done.

done.
Loaded symbols for /boot/kernel/libalias.ko
Reading symbols from /boot/kernel/ng_socket.ko...Reading symbols from 
/boot/kernel/ng_socket.ko.symbols...done.

done.
Loaded symbols for /boot/kernel/ng_socket.ko
Reading symbols from /boot/kernel/netgraph.ko...Reading symbols from 
/boot/kernel/netgraph.ko.symbols...done.

done.
Loaded symbols for /boot/kernel/netgraph.ko
Reading symbols from /boot/kernel/ng_mppc.ko...Reading symbols from 
/boot/kernel/ng_mppc.ko.symbols...done.

done.
Loaded symbols for /boot/kernel/ng_mppc.ko
Reading symbols from /boot/kernel/rc4.ko...Reading symbols from 
/boot/kernel/rc4.ko.symbols...done.

done.
Loaded symbols for /boot/kernel/rc4.ko
Reading symbols from /boot/kernel/ichwd.ko...Reading symbols from 
/boot/kernel/ichwd.ko.symbols...done.

done.
Loaded symbols for /boot/kernel/ichwd.ko
Reading symbols from /boot/kernel/ng_l2tp.ko...Reading symbols from 
/boot/kernel/ng_l2tp.ko.symbols...done.

done.
Loaded symbols for /boot/kernel/ng_l2tp.ko
Reading symbols from /boot/kernel/ng_ksocket.ko...Reading symbols 
from /boot/kernel/ng_ksocket.ko.symbols...done.

done.
Loaded symbols for /boot/kernel/ng_ksocket.ko
Reading symbols from /boot/kernel/ng_tee.ko...Reading symbols from 
/boot/kernel/ng_tee.ko.symbols...done.

done.
Loaded symbols for /boot/kernel/ng_tee.ko
Reading symbols from /boot/kernel/ng_iface.ko...Reading symbols from 
/boot/kernel/ng_iface.ko.symbols...done.

done.
Loaded symbols for /boot/kernel/ng_iface.ko
Reading symbols from /boot/kernel/ng_ppp.ko...Reading symbols from 
/boot/kernel/ng_ppp.ko.symbols...done.

done.
Loaded symbols for /boot/kernel/ng_ppp.ko
Reading symbols from /boot/kernel/ng_tcpmss.ko...Reading symbols from 
/boot/kernel/ng_tcpmss.ko.symbols...done.

done.
Loaded symbols for /boot/kernel/ng_tcpmss.ko
#0  doadump () at pcpu.h:231
231 pcpu.h: No such file or directory.
in pcpu.h
(kgdb) #0  doadump () at pcpu.h:231
#1  0xc06b0ac3 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:416
#2  0xc06b0d29 in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:590
#3  0xc092239c in trap_fatal (frame=0xc4e706c4, eva=36)
at /usr/src/sys/i386/i386/trap.c:938
#4  0xc0922620 in trap_pfault (frame=0xc4e706c4, usermode=0, eva=36)
at /usr/src/sys/i386/i386/trap.c:851
#5  0xc0922f0c in trap (frame=0xc4e706c4) at /usr/src/sys/i386/i386/trap.c:533
#6  0xc0904a9c in calltrap () at /usr/src/sys/i386/i386/exception.s:166
#7  0xc5d11e15 in ng_address_hook (here=0x0, item=0xc5da0540,
hook=0xc65ce400, retaddr=0)
at /usr/src/sys/modules/netgraph/netgraph/../../../netgraph/ng_base.c:3504
#8  0xc5dddbcb in ng_tee_rcvdata (hook=0xca997580, item=0xc5da0540)
at /usr/src/sys/modules/netgraph/tee/../../../netgraph/ng_tee.c:326
#9  0xc5d137c4 in ng_apply_item (node=0xc6499b80, item=0xc5da0540, rw=0)
at /usr/src/sys/modules/netgraph/netgraph/../../../netgraph/ng_base.c:2336
#10 0xc5d1279f in ng_snd_item (item=0xc5da0540, flags=Variable 
"flags" is not available.

)
at /usr/src/sys/modules/netgraph/netgraph/../../../netgraph/ng_base.c:2253
#11 0xc5eca84a in ng_ppp_link_xmit (node=Variable "node" is not available.
)
at /usr/src/sys/modules/netgraph/ppp/../../../netgraph/ng_ppp.

Re: kernel MCA messages

2010-08-24 Thread John Baldwin
On Monday, August 23, 2010 5:35:40 pm Matthew D. Fuller wrote:
> On Mon, Aug 23, 2010 at 08:20:35AM -0400 I heard the voice of
> John Baldwin, and lo! it spake thus:
> >
> > It is not private, it is in //depot/projects/mcelog/... in p4.
> 
> Which may as well be Siberia for us lowly non-developers.  Any chance
> you could stick a tarball or a patch against upstream mcelog
> somewhere?

It is actually public at perforce.freebsd.org. :)  However, it is tedious to 
download the files.  It really should be a port perhaps, though Someone (tm) 
should try to get the patches integrated upstream.

You can find a patch at www.freebsd.org/~jhb/mcelog/.  You will also need to 
download the memstream.c file from there as well and put that in the extracted 
mcelog tarball.

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: kernel MCA messages

2010-08-24 Thread Artem Belevich
IMHO the key here is whether hardware is broken or not. The only case
where correctable ECC errors are OK is when a bit gets flipped by a
high-energy particle. That's a normal but fairly rare event. If you
get bit flips often enough that you can recall details of more then
one of them on the same hardware, my guess would be that you're
dealing with something else -- bad/marginal memory, signal integrity
issues, power issues, overheating... The list continues.. In all those
cases hardware does *not* work correctly. Whether you can (or want to)
keep running stuff on the hardware that is broken is another question.

--Artem



On Tue, Aug 24, 2010 at 1:15 AM, Andriy Gapon  wrote:
> on 24/08/2010 09:14 Ronald Klop said the following:
>>
>> A little off topic, but what is 'a low rate of corrected ECC errors'? At work
>> one machine has them like ones per day, but runs ok. Is ones per day much?
>
> That's up to your judgment.  It's like after how many remapped sectors do you
> replace HDD.
> You may find this interesting:
> http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf
>
> --
> Andriy Gapon
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
>
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: kernel MCA messages

2010-08-24 Thread Andriy Gapon
on 24/08/2010 22:51 Artem Belevich said the following:
> IMHO the key here is whether hardware is broken or not. The only case
> where correctable ECC errors are OK is when a bit gets flipped by a
> high-energy particle. That's a normal but fairly rare event. If you
> get bit flips often enough that you can recall details of more then
> one of them on the same hardware, my guess would be that you're
> dealing with something else -- bad/marginal memory, signal integrity
> issues, power issues, overheating... The list continues.. In all those
> cases hardware does *not* work correctly. Whether you can (or want to)
> keep running stuff on the hardware that is broken is another question.

Have you read the article? :)
If not, read at least the summary.

> On Tue, Aug 24, 2010 at 1:15 AM, Andriy Gapon  wrote:
>> on 24/08/2010 09:14 Ronald Klop said the following:
>>>
>>> A little off topic, but what is 'a low rate of corrected ECC errors'? At 
>>> work
>>> one machine has them like ones per day, but runs ok. Is ones per day much?
>>
>> That's up to your judgment.  It's like after how many remapped sectors do you
>> replace HDD.
>> You may find this interesting:
>> http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf
>>
>> --
>> Andriy Gapon

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Attn Ronald Klop

2010-08-24 Thread Andriy Gapon

Ronald,

your email address bounces, that's inconvenient.


 Original Message 
Subject: Returned mail: Service unavailable
Date: Tue, 24 Aug 2010 23:03:33 +0300 (EEST)
From: Mail Delivery Subsystem 
To: 

The original message was received at Tue, 24 Aug 2010 23:03:27 +0300 (EEST)
from porto-e.starpoint.kiev.ua [212.40.38.100]

   - The following addresses had permanent fatal errors -


   - Transcript of session follows -
... while talking to thuis.klop.ws.:
>>> RCPT To:
<<< 554 5.7.1 : Relay access denied
554 ... Service unavailable

--- Begin Message ---
on 24/08/2010 22:51 Artem Belevich said the following:
> IMHO the key here is whether hardware is broken or not. The only case
> where correctable ECC errors are OK is when a bit gets flipped by a
> high-energy particle. That's a normal but fairly rare event. If you
> get bit flips often enough that you can recall details of more then
> one of them on the same hardware, my guess would be that you're
> dealing with something else -- bad/marginal memory, signal integrity
> issues, power issues, overheating... The list continues.. In all those
> cases hardware does *not* work correctly. Whether you can (or want to)
> keep running stuff on the hardware that is broken is another question.

Have you read the article? :)
If not, read at least the summary.

> On Tue, Aug 24, 2010 at 1:15 AM, Andriy Gapon  wrote:
>> on 24/08/2010 09:14 Ronald Klop said the following:
>>>
>>> A little off topic, but what is 'a low rate of corrected ECC errors'? At 
>>> work
>>> one machine has them like ones per day, but runs ok. Is ones per day much?
>>
>> That's up to your judgment.  It's like after how many remapped sectors do you
>> replace HDD.
>> You may find this interesting:
>> http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf
>>
>> --
>> Andriy Gapon

-- 
Andriy Gapon

--- End Message ---
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: kernel MCA messages

2010-08-24 Thread Dan Langille

On 8/22/2010 9:18 PM, Dan Langille wrote:

What does this mean?

kernel: MCA: Bank 4, Status 0x940c4001fe080813
kernel: MCA: Global Cap 0x0105, Status 0x
kernel: MCA: Vendor "AuthenticAMD", ID 0xf5a, APIC ID 0
kernel: MCA: CPU 0 COR BUSLG Source RD Memory
kernel: MCA: Address 0x7ff6b0

FreeBSD 7.3-STABLE #1: Sun Aug 22 23:16:43


FYI, these are occurring every hour, almost to the second. e.g. 
xx:56:yy, where yy is 09, 10, or 11.


Checking logs, I don't see anything that correlates with this point in 
the hour (i.e 56 minutes past) that doesn't also occur at other times.


It seems very odd to occur so regularly.

--
Dan Langille - http://langille.org/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Crashes on X7SPE-HF with em

2010-08-24 Thread Philipp Wuensche
Hi,

I'm having trouble with a system on a Supermicro X7SPE-HF, it crashes
about once a day. I haven't found a way to trigger this yet.

The system has a bunch of VLANs on em1, it does routing between them.

Currently its running 8-STABLE but it happend with 8.1-RELEASE too.

greetings,
Philipp

# kgdb kernel.debug /var/crash/vmcore.0
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:


Fatal trap 9: general protection fault while in kernel mode
cpuid = 0; apic id = 00
instruction pointer = 0x20:0x8061f5a8
stack pointer   = 0x28:0xff8e64d0
frame pointer   = 0x28:0xff8e64e0
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 0 (em1 taskq)
trap number = 9
panic: general protection fault
cpuid = 0
Uptime: 13h24m39s
Physical memory: 4079 MB
Dumping 1349 MB: 1334 1318 1302 1286 1270 1254 1238 1222 1206 1190 1174
1158 1142 1126 1110 1094 1078 1062 1046 1030 1014 998 982 966 950 934
918 902 886 870 854 838 822 806 790 774 758 742 726 710 694 678 662 646
630 614 598 582 566 550 534 518 502 486 470 454 438 422 406 390 374 358
342 326 310 294 278 262 246 230 214 198 182 166 150 134 118 102 86 70 54
38 22 6

Reading symbols from /boot/kernel/zfs.ko...Reading symbols from
/boot/kernel/zfs.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/zfs.ko
Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from
/boot/kernel/opensolaris.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/opensolaris.ko
Reading symbols from /boot/kernel/coretemp.ko...Reading symbols from
/boot/kernel/coretemp.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/coretemp.ko
Reading symbols from /boot/kernel/ahci.ko...Reading symbols from
/boot/kernel/ahci.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/ahci.ko
Reading symbols from /boot/kernel/ipmi.ko...Reading symbols from
/boot/kernel/ipmi.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/ipmi.ko
Reading symbols from /boot/kernel/smbus.ko...Reading symbols from
/boot/kernel/smbus.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/smbus.ko
Reading symbols from /boot/kernel/pflog.ko...Reading symbols from
/boot/kernel/pflog.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/pflog.ko
Reading symbols from /boot/kernel/pf.ko...Reading symbols from
/boot/kernel/pf.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/pf.ko
#0  doadump () at pcpu.h:224
224 __asm("movq %%gs:0,%0" : "=r" (td));
(kgdb) list *0x8061f5a8
0x8061f5a8 is in m_tag_locate (/usr/src/sys/kern/uipc_mbuf2.c:389).
384 if (t == NULL)
385 p = SLIST_FIRST(&m->m_pkthdr.tags);
386 else
387 p = SLIST_NEXT(t, m_tag_link);
388 while (p != NULL) {
389 if (p->m_tag_cookie == cookie && p->m_tag_id == type)
390 return p;
391 p = SLIST_NEXT(p, m_tag_link);
392 }
393 return NULL;
(kgdb) backtrace
#0  doadump () at pcpu.h:224
#1  0x805c25ce in boot (howto=260)
at /usr/src/sys/kern/kern_shutdown.c:416
#2  0x805c29dc in panic (fmt=0x0)
at /usr/src/sys/kern/kern_shutdown.c:590
#3  0x808d40bd in trap_fatal (frame=0x80c8af60,
eva=Variable "eva" is not available.
)
at /usr/src/sys/amd64/amd64/trap.c:777
#4  0x808d4a8b in trap (frame=0xff8e6420)
at /usr/src/sys/amd64/amd64/trap.c:588
#5  0x808b9d64 in calltrap ()
at /usr/src/sys/amd64/amd64/exception.S:224
#6  0x8061f5a8 in m_tag_locate (m=0xff010bb45c00, cookie=0,
type=6, t=Variable "t" is not available.
) at /usr/src/sys/kern/uipc_mbuf2.c:388
#7  0x806d7c56 in ip_ipsec_output (m=0xff8e6598,
inp=0xff010be43150, flags=0xff8e6594,
error=0xff8e65a8, ifp=Variable "ifp" is not available.
) at mbuf.h:1006
#8  0x806d97ef in ip_output (m=0xff010bb45c00, opt=Variable
"opt" is not available.
)
at /usr/src/sys/netinet/ip_output.c:483
#9  0x8073ef13 in tcp_output (tp=0xff000a9eb370)
at /usr/src/sys/netinet/tcp_output.c:1190
#10 0x8073a42d in tcp_do_segment (m=0xff000a4cd800,
th=0xff000a4df824, so=0xff000a9037f8, tp=0xff000a9eb370,
drop_hdrlen=52, tlen=0, iptos=0 '\0', ti_locked=2)
at /usr/src/sys/netinet/tcp_input.c:1484
#11 0x8073cf7b in tcp_input (m=0x

Re: kernel MCA messages

2010-08-24 Thread Jeremy Chadwick
On Tue, Aug 24, 2010 at 07:13:23PM -0400, Dan Langille wrote:
> On 8/22/2010 9:18 PM, Dan Langille wrote:
> >What does this mean?
> >
> >kernel: MCA: Bank 4, Status 0x940c4001fe080813
> >kernel: MCA: Global Cap 0x0105, Status 0x
> >kernel: MCA: Vendor "AuthenticAMD", ID 0xf5a, APIC ID 0
> >kernel: MCA: CPU 0 COR BUSLG Source RD Memory
> >kernel: MCA: Address 0x7ff6b0
> >
> >FreeBSD 7.3-STABLE #1: Sun Aug 22 23:16:43
> 
> FYI, these are occurring every hour, almost to the second. e.g.
> xx:56:yy, where yy is 09, 10, or 11.
> 
> Checking logs, I don't see anything that correlates with this point
> in the hour (i.e 56 minutes past) that doesn't also occur at other
> times.
> 
> It seems very odd to occur so regularly.

1) Why haven't you replaced the DIMM in Bank 4 -- or better yet, all
   the DIMMs just to be sure?  Do this and see if the problem goes
   away.  If not, no harm done, and you've narrowed it down.

2) What exact manufacturer and model of motherboard is this?  If
   you can provide a link to a User Manual that would be great.

3) Please go into your system BIOS and find where "ECC ChipKill"
   options are available (likely under a Memory, Chipset, or
   Northbridge section).  Please write down and provide here all
   of the options and what their currently selected values are.

4) Please make sure you're running the latest system BIOS.  I've seen
   on certain Rackable AMD-based systems where Northbridge-related
   features don't work quite right (at least with Solaris), resulting
   in atrocious memory performance on the system.  A BIOS upgrade
   solved the problem.

There's a ChipKill feature called "ECC BG Scrubbing" that's vague in
definition, given that it's a "background memory scrub" that happens at
intervals which are unknown to me.  Maybe 60 minutes?  I don't know.
This is why I ask question #3.

For John and other devs: I assume the decoded MCA messages indicate with
absolute certainty that the ECC error is coming from external DRAM and
not, say, bad L1 or L2 cache?

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: kernel MCA messages

2010-08-24 Thread Dan Langille

On 8/24/2010 7:38 PM, Jeremy Chadwick wrote:

On Tue, Aug 24, 2010 at 07:13:23PM -0400, Dan Langille wrote:

On 8/22/2010 9:18 PM, Dan Langille wrote:

What does this mean?

kernel: MCA: Bank 4, Status 0x940c4001fe080813
kernel: MCA: Global Cap 0x0105, Status 0x
kernel: MCA: Vendor "AuthenticAMD", ID 0xf5a, APIC ID 0
kernel: MCA: CPU 0 COR BUSLG Source RD Memory
kernel: MCA: Address 0x7ff6b0

FreeBSD 7.3-STABLE #1: Sun Aug 22 23:16:43


FYI, these are occurring every hour, almost to the second. e.g.
xx:56:yy, where yy is 09, 10, or 11.

Checking logs, I don't see anything that correlates with this point
in the hour (i.e 56 minutes past) that doesn't also occur at other
times.

It seems very odd to occur so regularly.


1) Why haven't you replaced the DIMM in Bank 4 -- or better yet, all
the DIMMs just to be sure?  Do this and see if the problem goes
away.  If not, no harm done, and you've narrowed it down.


For good reason: time and distance.   I've not hand the time or 
opportunity to buy new RAM.  Today is Tuesday.  The problem appeared 
about 48 hours ago after upgrading to 8.1 stable from 7.x.  The box is 
in Austin.  I'm in Philadelphia.  You know the math.  ;)  When I can get 
the time to fly to Austin, I will if required.


I'm sorry, I'm not meaning to be flippant.  I'm just glad I documented 
as such as I could 4 years ago.



2) What exact manufacturer and model of motherboard is this?  If
you can provide a link to a User Manual that would be great.


 This is a box from iXsystems that I obtained back when 6.1-RELEASE was 
the latest.  I know it has four sticks of 2GB.


   http://www.freebsddiary.org/dual-opteron.php

Sadly, many of the links are now invalid. The board is a AccelerTech 
ATO2161-DC, also known as a RioWorks HDAMA-G.


See also:

  http://www.freebsddiary.org/dual-opteron-dmidecode.txt

And we have a close up of the RAM and the m/b:

  http://www.freebsddiary.org/showpicture.php?id=85
  http://www.freebsddiary.org/showpicture.php?id=84

I am quite sure it's very close to this:

  http://www.accelertech.com/2007/amd_mb/opteron/ato2161i-dc_pic.php

With the manual here:

  http://www.accelertech.com/2007/amd_mb/opteron/ato2161i-dc_manual.php


3) Please go into your system BIOS and find where "ECC ChipKill"
options are available (likely under a Memory, Chipset, or
Northbridge section).  Please write down and provide here all
of the options and what their currently selected values are.

4) Please make sure you're running the latest system BIOS.  I've seen
on certain Rackable AMD-based systems where Northbridge-related
features don't work quite right (at least with Solaris), resulting
in atrocious memory performance on the system.  A BIOS upgrade
solved the problem.


3 & 4 are just as hard as #1 at the moment.


There's a ChipKill feature called "ECC BG Scrubbing" that's vague in
definition, given that it's a "background memory scrub" that happens at
intervals which are unknown to me.  Maybe 60 minutes?  I don't know.
This is why I ask question #3.

For John and other devs: I assume the decoded MCA messages indicate with
absolute certainty that the ECC error is coming from external DRAM and
not, say, bad L1 or L2 cache?


Nice question.

--
Dan Langille - http://langille.org/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Crashes on X7SPE-HF with em

2010-08-24 Thread Mike Tancsa

At 06:55 PM 8/24/2010, Philipp Wuensche wrote:

Hi,

I'm having trouble with a system on a Supermicro X7SPE-HF, it crashes
about once a day. I haven't found a way to trigger this yet.

The system has a bunch of VLANs on em1, it does routing between them.

Currently its running 8-STABLE but it happend with 8.1-RELEASE too.


I dont think its the same problem you are seeing, but the patch in
http://lists.freebsd.org/pipermail/freebsd-stable/2010-August/058296.html 


might be worth a try.

---Mike




greetings,
Philipp

# kgdb kernel.debug /var/crash/vmcore.0
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:


Fatal trap 9: general protection fault while in kernel mode
cpuid = 0; apic id = 00
instruction pointer = 0x20:0x8061f5a8
stack pointer   = 0x28:0xff8e64d0
frame pointer   = 0x28:0xff8e64e0
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 0 (em1 taskq)
trap number = 9
panic: general protection fault
cpuid = 0
Uptime: 13h24m39s
Physical memory: 4079 MB
Dumping 1349 MB: 1334 1318 1302 1286 1270 1254 1238 1222 1206 1190 1174
1158 1142 1126 1110 1094 1078 1062 1046 1030 1014 998 982 966 950 934
918 902 886 870 854 838 822 806 790 774 758 742 726 710 694 678 662 646
630 614 598 582 566 550 534 518 502 486 470 454 438 422 406 390 374 358
342 326 310 294 278 262 246 230 214 198 182 166 150 134 118 102 86 70 54
38 22 6

Reading symbols from /boot/kernel/zfs.ko...Reading symbols from
/boot/kernel/zfs.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/zfs.ko
Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from
/boot/kernel/opensolaris.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/opensolaris.ko
Reading symbols from /boot/kernel/coretemp.ko...Reading symbols from
/boot/kernel/coretemp.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/coretemp.ko
Reading symbols from /boot/kernel/ahci.ko...Reading symbols from
/boot/kernel/ahci.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/ahci.ko
Reading symbols from /boot/kernel/ipmi.ko...Reading symbols from
/boot/kernel/ipmi.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/ipmi.ko
Reading symbols from /boot/kernel/smbus.ko...Reading symbols from
/boot/kernel/smbus.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/smbus.ko
Reading symbols from /boot/kernel/pflog.ko...Reading symbols from
/boot/kernel/pflog.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/pflog.ko
Reading symbols from /boot/kernel/pf.ko...Reading symbols from
/boot/kernel/pf.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/pf.ko
#0  doadump () at pcpu.h:224
224 __asm("movq %%gs:0,%0" : "=r" (td));
(kgdb) list *0x8061f5a8
0x8061f5a8 is in m_tag_locate (/usr/src/sys/kern/uipc_mbuf2.c:389).
384 if (t == NULL)
385 p = SLIST_FIRST(&m->m_pkthdr.tags);
386 else
387 p = SLIST_NEXT(t, m_tag_link);
388 while (p != NULL) {
389 if (p->m_tag_cookie == cookie && p->m_tag_id == type)
390 return p;
391 p = SLIST_NEXT(p, m_tag_link);
392 }
393 return NULL;
(kgdb) backtrace
#0  doadump () at pcpu.h:224
#1  0x805c25ce in boot (howto=260)
at /usr/src/sys/kern/kern_shutdown.c:416
#2  0x805c29dc in panic (fmt=0x0)
at /usr/src/sys/kern/kern_shutdown.c:590
#3  0x808d40bd in trap_fatal (frame=0x80c8af60,
eva=Variable "eva" is not available.
)
at /usr/src/sys/amd64/amd64/trap.c:777
#4  0x808d4a8b in trap (frame=0xff8e6420)
at /usr/src/sys/amd64/amd64/trap.c:588
#5  0x808b9d64 in calltrap ()
at /usr/src/sys/amd64/amd64/exception.S:224
#6  0x8061f5a8 in m_tag_locate (m=0xff010bb45c00, cookie=0,
type=6, t=Variable "t" is not available.
) at /usr/src/sys/kern/uipc_mbuf2.c:388
#7  0x806d7c56 in ip_ipsec_output (m=0xff8e6598,
inp=0xff010be43150, flags=0xff8e6594,
error=0xff8e65a8, ifp=Variable "ifp" is not available.
) at mbuf.h:1006
#8  0x806d97ef in ip_output (m=0xff010bb45c00, opt=Variable
"opt" is not available.
)
at /usr/src/sys/netinet/ip_output.c:483
#9  0x8073ef13 in tcp_output (tp=0xff000a9eb370)
at /usr/src/

Re: kernel MCA messages

2010-08-24 Thread Matthew D. Fuller
On Tue, Aug 24, 2010 at 11:06:43AM -0400 I heard the voice of
John Baldwin, and lo! it spake thus:
> 
> It is actually public at perforce.freebsd.org. :)  However, it is
> tedious to download the files.

Oh, I'd apparently blocked out of my mind that you could clicky-clicky
files one at a time from there.  Probably for the best; I'd be real
annoyed by the end of that   ;)


> You can find a patch at www.freebsd.org/~jhb/mcelog/.  You will also
> need to download the memstream.c file from there as well and put
> that in the extracted mcelog tarball.

Thanks!  For anyone following along at home, I needed to make a few
changes to get it compiling here:

- I'm on a nice recent -CURRENT, so I had to #if 0 out the getline()
  definition.

- Add a FREEBSD definition to the Makefile (or remember it manually).

- Comment out the kread_symbol() of X_SNAPDATE in mcelog.c.  I don't
  see X_SNAPDATE defined anywhere in my /usr/include, and the var
  doesn't seem to ever actually be read for anything anyway (unless
  I'm supposed to -DLOCAL_HACK...).


-- 
Matthew Fuller (MF4839)   |  fulle...@over-yonder.net
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
   On the Internet, nobody can hear you scream.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"