On Thu, Sep 10, 2009 at 9:09 AM, Chris Kirby <christopher.ki...@sun.com> wrote: > On Sep 10, 2009, at 7:07 AM, Brandon Mercer wrote: > >> On Thu, Sep 10, 2009 at 5:11 AM, <casper....@sun.com> wrote: >>> >>>> Hello all, I'm running 2009.06 and I've got a "random" kernel panic >>>> that keeps killing my system under high IO loads. It happens almost >>>> every time I start loading up the writes on at pool. Memory has been >>>> tested extensively and I'm relatively certain this is not a hardware >>>> related issue. here is the panic: >>>> Sep 9 22:09:45 eon genunix: [ID 683410 kern.notice] BAD TRAP: type=d >>>> (#gp General protection) rp=ffffff0010362770 addr=ff7fff02fe41cc78 >>>> Sep 9 22:09:45 eon unix: [ID 100000 kern.notice] >>>> Sep 9 22:09:45 eon unix: [ID 839527 kern.notice] sched: >>>> Sep 9 22:09:45 eon unix: [ID 753105 kern.notice] #gp General protection >>>> Sep 9 22:09:45 eon unix: [ID 358286 kern.notice] >>>> addr=0xff7fff02fe41cc78 >>>> Sep 9 22:09:45 eon unix: [ID 243837 kern.notice] pid=0, >>> >>> >>> "Random" panics are, unfortunately, mostly caused by bad hardware. >>> >>> Do you have ECC memory in the system? Did you run memtest86 on your >>> system? >> >> Casper, >> I have run memtest86 on the machine for about 4 hours which was enough >> time to complete two passes. It is not ECC memory in this machine. >> Perhaps if I said this isn't a random panic but more of an easily >> reproducable panic... :) If I do dd if=/dev/zero of=/pool/blah >> bs=1024k count=10000 it will always panic and reboot. In this type of >> a scenario it seems less like hardware to me and more like a bug. >> What do you think? > > Brandon, > It looks like you have some bad RAM. The bad address (ff7fff02fe41cc78) > appears to have a single-bit error (the leading ff7 should probably be fff).
Chris, You may well be right. I appreciate you taking the time to look at this. Just wish there were more reliable tools to find this type of thing. I guess I'll look into some options with the memory... perhaps I can add voltage or adjust the timing by hand until it goes away. Perhaps I could just buy ECC memory ;) Again, my previous emails only reflected what I know based on the tools I have. Thanks again. Brandon _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss