Platform: - old dell workstation with an Andataco gigaraid enclosure plugged into an Adaptec 39160 - Nevada b51
Current zpool config: - one two-disk mirror with two hot spares In my ferocious pounding of ZFS I've managed to corrupt my data pool. This is what I've been doing to test it: - set zil_disable to 1 in /etc/system - continually untar a couple of files into the filesystem - manually spin down a drive in the mirror by holding down the button on the enclosure - for any system hangs reboot with a nasty reboot -dnq I've gotten different results after the spindown: - works properly: short or no hang, hot spare successfully added to the mirror - system hangs, and after a reboot the spare is not added - tar hangs, but after running "zpool status" the hot spare is added properly and tar continues - tar continues, but hangs on "zpool status" The last is what happened just prior to the corruption. Here's the output of zpool status: nextest-01# zpool status -v pool: zmir state: DEGRADED status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: resilver completed with 1 errors on Thu Nov 30 11:37:21 2006 config: NAME STATE READ WRITE CKSUM zmir DEGRADED 8 0 4 mirror DEGRADED 8 0 4 c3t3d0 ONLINE 0 0 24 c3t4d0 UNAVAIL 0 0 0 cannot open spares c0t0d0 AVAIL c3t1d0 AVAIL errors: The following persistent errors have been detected: DATASET OBJECT RANGE 15 0 lvl=4294967295 blkid=0 So the questions are: - is this fixable? I don't see an inum I could run find on to remove, and I can't even do a zfs volinit anyway: nextest-01# zfs volinit cannot iterate filesystems: I/O error - would not enabling zil_disable have prevented this? - Should I have been doing a 3-way mirror? - Is there a more optimum configuration to help prevent this kind of corruption? Ultimately, I want to build a ZFS server with performance and reliability comparable to say, a Netapp, but the fact that I appear to have been able to nuke my pool by simulating a hardware error gives me pause. I'd love to know if I'm off-base in my worries. Jim This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss