On Thu, Sep 10, 2009 at 9:09 AM, Chris Kirby <christopher.ki...@sun.com> wrote:
> On Sep 10, 2009, at 7:07 AM, Brandon Mercer wrote:
>
>> On Thu, Sep 10, 2009 at 5:11 AM,  <casper....@sun.com> wrote:
>>>
>>>> Hello all, I'm running 2009.06 and I've got a "random" kernel panic
>>>> that keeps killing my system under high IO loads.  It happens almost
>>>> every time I start loading up the writes on at pool.  Memory has been
>>>> tested extensively and I'm relatively certain this is not a hardware
>>>> related issue.  here is the panic:
>>>> Sep  9 22:09:45 eon genunix: [ID 683410 kern.notice] BAD TRAP: type=d
>>>> (#gp General protection) rp=ffffff0010362770 addr=ff7fff02fe41cc78
>>>> Sep  9 22:09:45 eon unix: [ID 100000 kern.notice]
>>>> Sep  9 22:09:45 eon unix: [ID 839527 kern.notice] sched:
>>>> Sep  9 22:09:45 eon unix: [ID 753105 kern.notice] #gp General protection
>>>> Sep  9 22:09:45 eon unix: [ID 358286 kern.notice]
>>>> addr=0xff7fff02fe41cc78
>>>> Sep  9 22:09:45 eon unix: [ID 243837 kern.notice] pid=0,
>>>
>>>
>>> "Random" panics are, unfortunately, mostly caused by bad hardware.
>>>
>>> Do you have ECC memory in the system?  Did you run memtest86 on your
>>> system?
>>
>> Casper,
>> I have run memtest86 on the machine for about 4 hours which was enough
>> time to complete two passes.  It is not ECC memory in this machine.
>> Perhaps if I said this isn't a random panic but more of an easily
>> reproducable panic... :)  If I do dd if=/dev/zero of=/pool/blah
>> bs=1024k count=10000 it will always panic and reboot.  In this type of
>> a scenario it seems less like hardware to me and more like a bug.
>> What do you think?
>
> Brandon,
>   It looks like you have some bad RAM.  The bad address (ff7fff02fe41cc78)
> appears to have a single-bit error (the leading ff7 should probably be fff).

Chris, You may well be right.  I appreciate you taking the time to
look at this.  Just wish there were more reliable tools to find this
type of thing.  I guess I'll look into some options with the memory...
perhaps I can add voltage or adjust the timing by hand until it goes
away.  Perhaps I could just buy ECC memory ;)  Again, my previous
emails only reflected what I know based on the tools I have.  Thanks
again.
Brandon
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to