Hi,
Whilst the way zfs looks for it's data everywhere can be useful when devices 
change,
I've been rather stung by it.
I have a raidz2 with 4x2TB and 2x 2x1TB stripes to make 6x2TB in total.
I currently have this:

  pool: pool2
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-4J
 scan: resilvered 1.83M in 0h0m with 0 errors on Thu Jul 14 14:59:22 2011
config:

        NAME                      STATE     READ WRITE CKSUM
        pool2                     DEGRADED     0     0     0
          raidz2-0                DEGRADED     0     0     0
            gpt/2TB_drive0        ONLINE       0     0     0
            gpt/2TB_drive1        ONLINE       0     0     0
            gpt/2TB_drive2        ONLINE       0     0     0
            13298804679359865221  UNAVAIL      0     0     0  was 
/dev/gpt/1TB_drive0
            12966661380732156057  UNAVAIL      0     0     0  was 
/dev/gpt/1TB_drive2
            gpt/2TB_drive3        ONLINE       0     0     0
        cache
          gpt/cache0              ONLINE       0     0     0

The two UNAVAIL entries used to be stripes. The system helpfully removed them 
for me.
These are the stripes that used to be in the pool:

# gstripe status
               Name  Status  Components
stripe/1TB_drive0+1      UP  gpt/1TB_drive1
                             gpt/1TB_drive0
stripe/1TB_drive2+3      UP  gpt/1TB_drive3
                             gpt/1TB_drive2

They still exist and have all the data in them.

It started, when I booted up with the drive that has gpt/1TB_drive1 missing and 
zfs helpfully replaced the
stripe/1TB_drive0+1 device with gpt/1TB_drive0 and told me it had corrupt data 
on it.

Am I right in thinking, that cos one drive was missing which meant that 
stripe/1TB_drive0+1
was then also missing, that zfs tasted around and found gpt/1TB_drive0 had what 
look like
the right header on it. However, 64k in, it would find incorrect data, as the 
next 64k was
on the missing part of the stripe on gpt/1TB_drive1?

I was contemplating how to get the stripe back into the pool without having to 
do a complete
resilver on it. Seemed unnecessary to have to do that when the data was all 
there.

I thought an export and import might help it find it. However, that for some 
reason did the same
to the other stripe stripe/1TB_drive2+3 and it got replaced with gpt/1TB_drive2.

Now I am left without parity.
Any ideas on what commands will bring this back?
I know I can do a replace on both, but if there is some undetected corruption 
on the other devices then I will
lose some data, as any parity that could fix it is currently missing. I do 
scrub regularly, but I'd prefer not
to take that chance. Especially as I have all the data sitting there!

I hoping someone has some magic zfs commands to make all this go away :)

What can I do to prevent this in future? I've run pools with stripes for years 
without this happening.
It seems zfs has started to look far and wide for it's devices? In the past if 
the stripe was broken,
it would just tell me the device was missing. When the stripe was back, then 
all was fine. However,
this tasting everywhere seems like stripes are now a no-no for zpools?

Thanks.
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to