--On Tuesday, September 25, 2007 8:49 AM +0200 Søren Schmidt <[EMAIL PROTECTED]> wrote:

Yarema wrote:
Hi, I need some help recovering from this.  First some back story.
Running 6.2-STABLE i386 from Sep 17, 2007.  My /home slice is mounted
from /dev/ar0s1e where the relevant kernel messages look like so when
all is good:

atapci1: <Intel ICH5 SATA150 controller>
ata2: <ATA channel 0> on atapci1
ata3: <ATA channel 1> on atapci1
ad4: 381554MB <WDC WD4000YR-01PLB0 01.06A01> at ata2-master SATA150
ad6: 381554MB <WDC WD4000YR-01PLB0 01.06A01> at ata3-master SATA150
ar0: 763108MB <FreeBSD PseudoRAID RAID0 (stripe 256 KB)> status: READY
ar0: disk0 READY using ad4 at ata2-master
ar0: disk1 READY using ad6 at ata3-master

Today this server crashed with the following loggeed:

ad4: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=144888320
ad4: TIMEOUT - READ_DMA retrying (1 retry left) LBA=143390319
ad4: FAILURE - device detached
ar0: FAILURE - RAID0 array broken
subdisk4: detached
ad4: detached
g_vfs_done():ar0s1e[WRITE(offset=146002964480, length=2048)]error = 5
initiate_write_filepage: already started
g_vfs_done():ar0s1e[WRITE(offset=146002964480, length=2048)]error = 5
g_vfs_done():ar0s1e[WRITE(offset=6144000, length=16384)]error = 5
g_vfs_done():ar0s1e[WRITE(offset=6160384, length=16384)]error = 5
g_vfs_done():ar0s1e[WRITE(offset=6176768, length=16384)]error = 5
g_vfs_done():ar0s1e[WRITE(offset=6193152, length=16384)]error = 5
g_vfs_done():ar0s1e[WRITE(offset=6209536, length=2048)]error = 5
g_vfs_done():ar0s1e[WRITE(offset=65536, length=2048)]error = 5
g_vfs_done():ar0s1e[WRITE(offset=147801325568, length=12288)]error = 5
g_vfs_done():ar0s1e[WRITE(offset=147142686720, length=2048)]error = 5
g_vfs_done():ar0s1e[WRITE(offset=65536, length=2048)]error = 5
g_vfs_done():ar0s1e[WRITE(offset=6144000, length=16384)]error = 5
g_vfs_done():ar0s1e[WRITE(offset=6160384, length=16384)]error = 5
g_vfs_done():ar0s1e[WRITE(offset=6176768, length=16384)]error = 5
g_vfs_done():ar0s1e[WRITE(offset=6193152, length=16384)]error = 5
g_vfs_done():ar0s1e[WRITE(offset=6209536, length=2048)]error = 5
g_vfs_done():ar0s1e[WRITE(offset=146831867904, length=16384)]error = 5
g_vfs_done():ar0s1e[WRITE(offset=147024330752, length=16384)]error = 5
initiate_write_filepage: already started
g_vfs_done():ar0s1e[WRITE(offset=146002964480, length=2048)]error = 5
initiate_write_filepage: already started
g_vfs_done():ar0s1e[WRITE(offset=146002964480, length=2048)]error = 5
initiate_write_filepage: already started
g_vfs_done():ar0s1e[WRITE(offset=147801325568, length=12288)]error = 5
initiate_write_filepage: already started
g_vfs_done():ar0s1e[WRITE(offset=147142686720, length=2048)]error = 5

Now the kernel messages read:

ar0: FAILURE - RAID0 array broken
ar0: 763108MB <FreeBSD PseudoRAID RAID0 (stripe 256 KB)> status: BROKEN
ar0: disk0 READY using ad4 at ata2-master
ar0: disk1 DOWN no device found for this subdisk
ar1: 763108MB <FreeBSD PseudoRAID RAID0 (stripe 256 KB)> status: BROKEN
ar1: disk0 DOWN no device found for this subdisk
ar1: disk1 READY using ad6 at ata3-master

For some reason the second disk in the array shows up as ar1 instead
of being part of ar0.  I suspect there's gotta be some way to force
the two drives to show up as part of the same array by perhaps editing
the PseudoRAID metadata on disk without putting any of the UFS2 data
in "jeopardy".  Any pointers on where to start poking around for the
relevant metadata structures on disk or what to search for?  I figure
if I can dd the metadata off the disks, tweak a field or two and then
dd the whole mess back I stand a chance of either hosing the array
irrevocably or getting it all back. ;)  Or maybe atacontrol could be
used to re-create the metadata without destroying the UFS2 on the
array?  I have a coredump of the kernel from this crash if that helps
analyze things any.


The solution to getting the array back is to "atacontrol delete ar0"
"atacontrol delete ar1" "atacontrol create stripe 512 ad4 ad6" and
the array is reborn.
 However your filesystems might be just a bunch of bits depending
on how much of the failed write that made it in there, you get the
(missing) protection you asked for using RAID0....

Søren,

Thank you for your prompt and helpful reply. I'm running into an new situation with atacontrol:

% atacontrol create RAID0 512 ad4 ad6
ar0: 763108MB <Intel MatrixRAID RAID0 (stripe 128 KB)> status: READY
ar0: disk0 READY using ad4 at ata2-master
ar0: disk1 READY using ad6 at ata3-master

Note that the original RAID0 which broke was
ar0: 763108MB <FreeBSD PseudoRAID RAID0 (stripe 256 KB)> status: READY

Now atacontrol will not create FreeBSD PseudoRAID metadata with a 256KB stripe, but insists on creating Intel MatrixRAID metadata with a 128KB stripe. This is on a non-R version of the ICH5 southbridge. So there's no way to enable/disable the Intel MatrixRAID from the BIOS. Nor is there any way to change the stripe size in the BIOS since there is no Intel MatrixRAID BIOS on this motherboard. The computer in question is a Dell SC400 with an Intel OEM motherboard which has a very limited BIOS Setup interface typical of Intel/Dell.

Is there any way to force atacontrol to create FreeBSD PseudoRAID metadata? Perhaps using an older FreeSBIE release based on FreeBSD 6.0 since IIRC I created this RAID0 back when 6.0 was CURRENT.

--
Yarema
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to