OK, so this is another "my pool got eaten" problem.  Our setup:

Nevada 77 when it happened, now running 87.
9 iSCSI vdevs exported from Linux boxes off of hardware RAID (running Linux for 
drivers on the RAID controllers).  The pool itself is simply striped.

Our problem:
Power got yanked to 8 of the 9 vdevs.  At the time, we had ZIL disabled and 
write-back caching enabled on the vdevs for performance reasons.  The ZIL *was* 
going to be re-enabled, but Murphy's Law says things crash beforehand.

On attempting to bring the system back up after a reboot, all the vdevs and the 
pool itself is marked FAULTED with corrupted data.

What we've attempted:
Since last Thursday (today is the Wednesday afterwords), we've tried using this 
weekend's nightly build to use zpool import -F to no avail.

In addition, I've been going through and applying dtrace probes into the kernel 
to see where its dying and how, to see if it's a "turn off sanity checks and 
mount r/o" issue, or if it's that our data is hopelessly munged.  This attempt 
has resulted in a bit of a goose chase, with possibilities popping up and 
failure modes branching quicker than I can take a close look at them.

My partner here is working on the possibility of an offline file-grabbing 
program, which shows some progress, but not much yet.

Our biggest problem is neither of us are experienced in kernel-land debugging 
or filesystems, and at least I am rather unexperienced with the debugging power 
tools available on Solaris, such as mdb, and uses of dtrace beyond looking at 
function return values and entry arguments.

Is there someone who has a bit more experience with this who can help us?

-- Matt
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to