Re: [zfs-discuss] Help with corrupted pool

Ethan Thu, 18 Feb 2010 09:44:30 -0800

On Thu, Feb 18, 2010 at 04:14, Daniel Carosone <d...@geek.com.au> wrote:


> On Wed, Feb 17, 2010 at 11:37:54PM -0500, Ethan wrote:
> > > It seems to me that you could also use the approach of 'zpool replace'
> for
> > That is true. It seems like it then have to rebuild from parity for every
> > drive, though, which I think would take rather a long while, wouldn't it?
>
> No longer than copying - plus, it will only resilver active data, so
> unless the pool is close to full it could save some time.  Certainly
> it will save some hassle and risk of error, plugging and swapping drives
> between machines more times.  As a further benefit, all this work will
> count towards a qualification cycle for the current hardware setup.
>
> I would recommend using replace, one drive at a time. Since you still
> have the original drives to fall back on, you can do this now (before
> making more changes to the pool with new data) without being overly
> worried about a second failure killing your raidz1 pool.  Normally,
> when doing replacements like this on a singly-redundant pool, it's a
> good idea to run a scrub after each replace, making sure everything
> you just wrote is valid before relying on it to resilver the next
> disk.
>
> If you're keen on copying, I'd suggest doing over the network; that
> way your write target is a system that knows the target partitioning
> and there's no (mis)caclulation of offsets.
>
> --
> Dan.



These are good points - it sounds like replacing one at a time is the way to
go. Thanks for pointing out these benefits.
Although I do notice that right now, it imports just fine using the p0
devices using just `zpool import q`, no longer having to use import -d with
the directory of symlinks to p0 devices. I guess this has to do with having
repaired the labels and such? Or whatever it's repaired having successfully
imported and scrubbed.
After the scrub finished, this is the state of my pool:


# zpool status
  pool: q
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub completed after 7h18m with 0 errors on Thu Feb 18 06:25:44
2010
config:

        NAME                                  STATE     READ WRITE CKSUM
        q                                     DEGRADED     0     0     0
          raidz1                              DEGRADED     0     0     0
            /export/home/ethan/qdsk/c9t4d0p0  ONLINE       0     0     0
            /export/home/ethan/qdsk/c9t5d0p0  ONLINE       0     0     0
            /export/home/ethan/qdsk/c9t2d0p0  ONLINE       0     0     0
            /export/home/ethan/qdsk/c9t1d0p0  DEGRADED     4     0    60
too many errors
            /export/home/ethan/qdsk/c9t0d0p0  ONLINE       0     0     0

errors: No known data errors


I have no idea what happened to the one disk, but "No known data errors" is
what makes me happy. I'm not sure if I should be concerned about the
physical disk itself, or just assume that some data got screwed up with all
this mess. I guess maybe I'll see how the disk behaves during the replace
operations (restoring to it and then restoring from it four times seems like
a pretty good test of it), and if it continues to error, replace the
physical drive and if necessary restore from the original truecrypt volumes.


So, current plan:
- export the pool.
- format c9t1d0 to have one slice being the entire disk.
- import. should be degraded, missing c9t1d0p0.
- replace missing c9t1d0p0 with c9t1d0 (should this be c9t1d0s0? my
understanding is that zfs will treat the two about the same, since it adds
the partition table to raw devices if that's what it's given and ends up
using s0 anyway)
- wait for resilver.
- repeat with the other four disks.

Sound good?

-Ethan

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Help with corrupted pool

Reply via email to