[zfs-discuss] [ZFS-discuss] RAIDZ drive "removed" status

David Stewart Tue, 29 Sep 2009 12:21:45 -0700

Having casually used IRIX in the past and used BeOS, Windows, and MacOS as 
primary OSes, last week I set up a RAIDZ NAS with four Western Digital 1.5TB 
drives and copied over data from my WinXP box.  All of the hardware is fresh 
out of the box so I did not expect any hardware problems, but when I ran zpool 
after a few days of uptime and copying 2.4TB of data to the system I received 
the following:


da...@opensolarisnas:~$ zpool status mediapool
  pool: mediapool
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
        repaired.
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        mediapool   DEGRADED     0     0     0
          raidz1    DEGRADED     0     0     0
            c8t0d0  ONLINE       0     0     0
            c8t1d0  ONLINE       0     0     0
            c8t2d0  ONLINE       0     0     0
            c8t3d0  FAULTED      0     0     0  too many errors

errors: No known data errors
da...@opensolarisnas:~$

I read the Solaris documentation and it seemed to indicate that I needed to run 
zpool clear.

da...@opensolarisnas:~$ zpool clear mediapool

And then the fun began.  The system froze and rebooted and I was stuck in a 
constant reboot cycle that would get to grub and selecting “opensolaris-2” and 
boot process and crash.  Removing the SATA card that the RAIDZ disks were 
attached to would result in a successful boot.  I reinserted the card, went 
through a few unsuccessful reboots, and magically it booted all the way for me 
to log in.  I then received the following:

me...@opensolarisnas:~$ zpool status -v mediapool
  pool: mediapool
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
    invalid.  Sufficient replicas exist for the pool to continue
    functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-4J
 scrub: scrub in progress for 0h2m, 0.29% done, 16h12m to go
config:

    NAME        STATE     READ WRITE CKSUM
    mediapool   DEGRADED     0     0     0
      raidz1    DEGRADED     0     0     0
        c8t0d0  ONLINE       0     0     0
        c8t1d0  ONLINE       0     0     0
        c8t2d0  ONLINE       0     0     0
        c8t3d0  UNAVAIL      7     0     0  experienced I/O failures

errors: No known data errors
me...@opensolarisnas:~$

I shut the machine down and unplugged the power supply and removed the SATA 
card and reinserted it, removed each of the SATA cables individually and 
reinserted them, removed each of the SATA power cables and reinserted them.  
Rebooted:

da...@opensolarisnas:~# zpool status -x mediapool
  pool: mediapool
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 0h20m, 2.68% done, 12h29m to go
config:

        NAME        STATE     READ WRITE CKSUM
        mediapool   DEGRADED     0     0     0
          raidz1    DEGRADED     0     0     0
            c8t0d0  ONLINE       0     0     0
            c8t1d0  ONLINE       0     0     0
            c8t2d0  ONLINE       0     0     0
            c8t3d0  REMOVED      0     0     0

errors: No known data errors
da...@opensolarisnas:~#


The resilvering completed everything seemed fine and I shut the machine down 
and rebooted later and went through the same boot & crash cycle that never got 
me to the login screen until it finally did get me to that screen for unknown 
reasons.  The machine is resilvering currently with the zpool status the same 
as above.  What happened, why did it happen, and how can I stop it from 
happening again?  Does OpenSolaris believe that c8t3d0 is not connected to the 
SATA card?  The SATA card BIOS sees all four drives.  What is the best way for 
me to figure out which drive is c8t3d0?  Some operating systems will tell you 
which drive is which by telling you the serial number of the drive.  Does 
OpenSolaris do this?  If so, how?  I looked through all of the 
Solaris/OpenSolaris documentation re: ZFS and RAIDZ for a mention of a 
“removed” status for a drive in RAIDZ configuration, but could not find mention 
outside of mirrors having this error.  Page 231 of the OS Bible mentions 
reattaching a drive in the “removed” status from a mirror.  Does this mean 
physically reattaching the drive (unplugging it and replugging it in) or does 
it mean somehow software reattaching it?  If I run “zpool offline –t c8t3d0” 
and reboot and then “zpool replace mediapool c8t3d0 “, then “zpool online 
mediapool c8t3d0 “ will this solve all my issues?

There is another issue and I don’t if it is related or not.  If it isn’t 
related, I will start another thread.  The size of the RAIDZ1 available space 
before I put anything on it was 4TB.  I put ~=2.4TB of data on it.  This is the 
size of the data on the WinXP NTFS box, what Nautilus reports and what Disk 
Usage Analyzer reports.  And yet “zfs list” reports I only have 432GB free.  
The Disk Usage Analyzer reports that the filesystem capacity of mediapool is 
2905.9GB.  The 2905GB would add correctly with 2.4TB + 432GB, but where did 
1.1TB go?  Is this from c8t3d0 being missing?  I did move ~=2TB of data to 
mediapool and use Nautilus to delete it.  I saw nothing in the Trash can, so I 
am assuming that it has been deleted.  Is this a correct assumption?
        BTW, all of the data is in it’s original state on the WinXP boxes, so 
if need be, I can start all over from scratch with the OpenSolaris installation 
and RAIDZ1 filesystem.  I am not keen on this as the 2.4TB of data is spread 
around and it takes forever to copy 2.4TB of data.

Thanks in advance,

David
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] [ZFS-discuss] RAIDZ drive "removed" status

Reply via email to