Re: [zfs-discuss] ATA UDMA data parity error

2008-01-22 Thread Kent Watsen
For the archive, I swapped the mobo and all is good now... (I copied 100GB into the pool without a crash) One problem I had was that Solaris would hang whenever booting - even when all the aoc-sat2-mv8 cards were pulled out. Turns out that switching the BIOS field "USB 2.0 Controller Mode" f

Re: [zfs-discuss] ATA UDMA data parity error

2008-01-18 Thread Kent Watsen
Thanks for the note Anton. I let memtest86 run overnight and it found no issues. I've also now moved the cards around and have confirmed that slot #3 on the mobo is bad (all my aoc-sat2-mv8 cards, cables, and backplanes are OK). However, I think its more than just slot #3 that has a fault b

Re: [zfs-discuss] ATA UDMA data parity error

2008-01-17 Thread Anton B. Rang
Definitely a hardware problem (possibly compounded by a bug). Some key phrases and routines: ATA UDMA data parity error This one actually looks like a misnomer. At least, I'd normally expect "data parity error" not to crash the system! (It should result in a retry or EIO.) PCI(-X) Expre

Re: [zfs-discuss] ATA UDMA data parity error

2008-01-17 Thread Richard Elling
Kent Watsen wrote: > > Thanks Richard and Al, > > I'll refrain from express how disturbing this is, as I'm trying to > help the Internet be kid-safe ;) > > As for the PSU, I'd be very surprised there if that were it as it is a > 3+1 redundant PSU that came with this system, built by a reputable

Re: [zfs-discuss] ATA UDMA data parity error

2008-01-17 Thread Kent Watsen
Thanks Richard and Al, I'll refrain from express how disturbing this is, as I'm trying to help the Internet be kid-safe   ;) As for the PSU, I'd be very surprised there if that were it as it is a 3+1 redundant PSU that came with this system, built by a reputable integrator.  Also, the PSU is

Re: [zfs-discuss] ATA UDMA data parity error

2008-01-17 Thread Al Hopper
On Thu, 17 Jan 2008, Richard Elling wrote: > Looks like flaky or broken hardware to me. It could be a > power supply issue, those tend to rear their ugly head when > workloads get heavy and they are usually the easiest to > replace. +1 PSU or memory (run memtestx86) > -- richard > > Kent Wats

Re: [zfs-discuss] ATA UDMA data parity error

2008-01-17 Thread Richard Elling
Looks like flaky or broken hardware to me. It could be a power supply issue, those tend to rear their ugly head when workloads get heavy and they are usually the easiest to replace. -- richard Kent Watsen wrote: > > > Below I create zpools isolating one card at a time > - when just card#1 - it

Re: [zfs-discuss] ATA UDMA data parity error

2008-01-17 Thread Kent Watsen
Below I create zpools isolating one card at a time - when just card#1 - it works - when just card #2 - it fails - when just card #3 - it works And then again using the two cards that seem to work: - when cards #1 and #3 - it fails So, at first I thought I narrowed it down to a card, but my

Re: [zfs-discuss] ATA UDMA data parity error

2008-01-17 Thread Kent Watsen
On a lark, I decided to create a new pool not including any devices connected to card #3 (i.e. "c5") It crashes again, but this time with a slightly different dump (see below) - actually, there are two dumps below, the first is using the xVM kernel and the second is not Any ideas? Kent [

[zfs-discuss] ATA UDMA data parity error

2008-01-17 Thread Kent Watsen
Hey all, I'm not sure if this is a ZFS bug or a hardware issue I'm having - any pointers would be great! Following contents include: - high-level info about my system - my first thought to debugging this - stack trace - format output - zpool status output - dmesg output High-Lev