Hello Matt,

Monday, December 3, 2007, 8:36:28 PM, you wrote:

MB> Hi,

MB> We have a number of 4200's setup using a combination of an SVM 4
MB> way mirror and a ZFS raidz stripe.

MB> Each disk (of 4) is divided up like this

MB> / 6GB UFS s0 
MB> Swap 8GB s1
MB> /var 6GB UFS s3
MB> Metadb 50MB UFS s4
MB> /data 48GB ZFS s5 

MB> For SVM we do a 4 way mirror on /,swap, and /var
MB> So we have 3 SVM mirrors
MB>     d0=root (sub mirrors d10, d20, d30, d40)
MB>     d1=swap (sub mirrors d11, d21,d31,d41)
MB>     d3=/var (sub mirrors d13,d23,d33,d43)

MB> For ZFS we have a single Raidz set across all four disks s5

MB> Everything has worked flawlessly for some time. This week we
MB> discovered that one of our 4200's is reporting some level of
MB> failure with regards to one of the disks

MB> We see these recurring errors in the syslog
MB> Dec  3 12:00:47 vfcustgfs02b scsi: [ID 107833 kern.notice]     
MB> Vendor: FUJITSU                            Serial Number: 0616S02DD5
MB> Dec  3 12:00:47 vfcustgfs02b scsi: [ID 107833 kern.notice]      Sense Key: 
Media Error
MB> Dec  3 12:00:47 vfcustgfs02b scsi: [ID 107833 kern.notice]     
MB> ASC: 0x15 (mechanical positioning error), ASCQ: 0x1, FRU: 0x0

MB> When we run a metastat we see that 2 of the 3 SVM mirrors is
MB> reporting that the failing disks submirror needs maintenance.
MB> Oddly enough, the third SVM mirror reports no issues making me
MB> think there is a media error on the disk that only happens to
MB> affect 2 of the 3 disks slices respectively

MB> Also "zpool status" reports read issues on the failing disk

MB> config:

MB>         NAME          STATE     READ WRITE CKSUM
MB>         zpool         ONLINE       0     0     0
MB>           raidz       ONLINE       0     0     0
MB>             c0t0d0s5  ONLINE       0     0     0
MB>             c0t1d0s5  ONLINE      50     0     0
MB>             c0t2d0s5  ONLINE       0     0     0
MB>             c0t3d0s5  ONLINE       0     0     0

MB> So my question is what series of steps do we need to perform
MB> given the fact that I have one disk out of four that hosts a zfs
MB> raidz on one slice, and SVM mirrors on 3 other slices, but only 2
MB> of the 3 SVM mirrors report requiring maintenance.

MB> We want to keep the data integrity in place (obviously) 
MB> The server is still operational, but we want to take this
MB> opportunity to hammer out these steps.


If you can add another disk then do it and replace a failing one with
the new one in SVM and ZFS (one by one - should be faster).

I guess you can't add another disk.
Then detach the disk from SVM (no need from MD which are already in
maintainance mode), detech (offline) it from zfs pool, destroy metadb
replica on that disk, write down vtoc
(prtvtoc), use cfgadm -c unconfigure or -disconnect, remove disk, put
new one, label it (fmthard) the same, put metadb, attach it to zfs
(online it actually) first, as you are risking your data in your
config on zfs, while still having 3-way mirror on SVM, once its done
attach it (replace) in SVM.
Probably install a bootblock too.




-- 
Best regards,
 Robert Milkowski                            mailto:[EMAIL PROTECTED]
                                       http://milek.blogspot.com

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to