On Thu, Dec 22, 2011 at 10:00 AM, Myers Carpenter <my...@maski.org> wrote:
> > > On Sat, Nov 5, 2011 at 2:35 PM, Myers Carpenter <my...@maski.org> wrote: > >> I would like to pick the brains of the ZFS experts on this list: What >> would you do next to try and recover this zfs pool? >> > > I hate running across threads that ask a question and the person that > asked them never comes back to say what they eventually did, so... > > To summarize: In late October I had two drives fail in a raidz1 pool. I > was able to recover all the data from one drive, but the other could not be > seen by the controller. Trying to zpool import was not working. I had 3 > of the 4 drives, why couldn't I mount this. > > I read about every option in zdb and tried ones that might tell me > something more about what was on this recovered drive. I eventually hit on > > zdb -p devs -vvvve -lu /bank4/hd/devs/loop0 > > where /bank4/hd/devs/loop0 was a symlink back to /dev/loop0 where I had > setup the disk image of the recovered drive. > > This showed the uberblocks which looked like this: > > Uberblock[1] > magic = 0000000000bab10c > version = 26 > txg = 23128193 > guid_sum = 13396147021153418877 > timestamp = 1316987376 UTC = Sun Sep 25 17:49:36 2011 > rootbp = DVA[0]=<0:2981f336c00:400> DVA[1]=<0:1e8dcc01400:400> > DVA[2]=<0:3b16a3dd400:400> [L0 DMU objset] fletcher4 lzjb LE contiguous > unique triple size=800L/200P birth=23128193L/23128193P fill=255 > cksum=136175e0a4:79b27ae49c7:1857d594ca833:34ec76b965ae40 > > Then it all came clear: This drive had encountered errors one month before > the other drive had failed and zfs had stopped writing to it. > > So the lesson here: Don't be a dumbass like me. Setup up nagios or some > other system to alert you when a pool has become degraded. ZFS works very > well with one drive out of the array, you aren't probably going to notice > problems unless you are proactively looking for them. > > myers > > > > Or, if you aren't scrubbing on a regular basis, just change your zpool failmode property. Had you set it to wait or panic, it would've been very clear, very quickly that something was wrong. http://prefetch.net/blog/index.php/2008/03/01/configuring-zfs-to-gracefully-deal-with-failures/ --Tim
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss