Re: [zfs-discuss] Can I trust ZFS?

Brent Jones Fri, 01 Aug 2008 17:43:12 -0700

I have done a bit of testing, and so far so good really.
I have a Dell 1800 with a Perc4e and a 14 drive Dell Powervault 220S.
I have a RaidZ2 volume named 'tank' that spans 6 drives. I have made 1 drive
available as a spare to ZFS.


Normal array:

# zpool status
  pool: tank
 state: ONLINE
 scrub: scrub completed with 0 errors on Fri Aug  1 19:37:33 2008
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            c0t1d0  ONLINE       0     0     0
            c0t2d0  ONLINE       0     0     0
            c0t3d0  ONLINE       0     0     0
            c0t4d0  ONLINE       0     0     0
            c0t5d0  ONLINE       0     0     0
            c0t6d0  ONLINE       0     0     0
        spares
          c0t13d0   AVAIL

errors: No known data errors



One drive removed:

# zpool status
  pool: tank
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist
for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-D3
 scrub: resilver completed with 0 errors on Fri Aug  1 20:30:39 2008
config:

        NAME           STATE     READ WRITE CKSUM
        tank           DEGRADED     0     0     0
          raidz2       DEGRADED     0     0     0
            c0t1d0     ONLINE       0     0     0
            c0t2d0     ONLINE       0     0     0
            spare      DEGRADED     0     0     0
              c0t3d0   UNAVAIL      0     0     0  cannot open
              c0t13d0  ONLINE       0     0     0
            c0t4d0     ONLINE       0     0     0
            c0t5d0     ONLINE       0     0     0
            c0t6d0     ONLINE       0     0     0
        spares
          c0t13d0      INUSE     currently in use

errors: No known data errors


Now lets remove the hot spare  ;)

# zpool status
  pool: tank
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist
for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-D3
 scrub: resilver completed with 0 errors on Fri Aug  1 20:30:39 2008
config:

        NAME           STATE     READ WRITE CKSUM
        tank           DEGRADED     0     0     0
          raidz2       DEGRADED     0     0     0
            c0t1d0     ONLINE       0     0     0
            c0t2d0     ONLINE       0     0     0
            spare      UNAVAIL      0   656     0  insufficient replicas
              c0t3d0   UNAVAIL      0     0     0  cannot open
              c0t13d0  UNAVAIL      0     0     0  cannot open
            c0t4d0     ONLINE       0     0     0
            c0t5d0     ONLINE       0     0     0
            c0t6d0     ONLINE       0     0     0
        spares
          c0t13d0      INUSE     currently in use

errors: No known data errors


Now, this Perc4e doesn't support JBOD, so each drive is a standalone Raid0
(how annoying).
With that, I cannot plug the drives back in with the system running,
controller will keep them offline until I enter the bios.

But in my scenario, this does demonstrate ZFS tolerates hot removal of
drives, without issuing a graceful removal of the device.
I was copying MP3s to the volume the whole time, and the copy continued
uninterrupted, without error.
I verified all data was written as well. All data should be online when I
reboot and put the pool back in normal state.

I am very happy with the test. I don't know many hardware controllers
that'll loose 3 drives out of an array of 6 (with spare), and still function
normally (even if the controller supports Raid6, I've seen major issues
where writes were not committed).

I'll add my results to your forum thread as well.

Regards

Brent Jones
[EMAIL PROTECTED]

On Thu, Jul 31, 2008 at 11:56 PM, Ross Smith <[EMAIL PROTECTED]> wrote:

>  Hey Brent,
>
> On the Sun hardware like the Thumper you do get a nice bright blue "ready
> to remove" led as soon as you issue the "cfgadm -c unconfigure xxx"
> command.  On other hardware it takes a little more care, I'm labelling our
> drive bays up *very* carefully to ensure we always remove the right drive.
> Stickers are your friend, mine will probably be labelled "sata1/0",
> "sata1/1", "sata1/2", etc.
>
> I know Sun are working to improve the LED support, but I don't know whether
> that support will ever be extended to 3rd party hardware:
> http://blogs.sun.com/eschrock/entry/external_storage_enclosures_in_solaris
>
> I'd love to use Sun hardware for this, but while things like x2200 servers
> are great value for money, Sun don't have anything even remotely competative
> to a standard 3U server with 16 SATA bays.  The x4240 is probably closest,
> but is at least double the price.  Even the J4200 arrays are more expensive
> than this entire server.
>
> Ross
>
> PS.  Once you've tested SCSI removal, could you add your results to my
> thread, would love to hear how that went.
> http://www.opensolaris.org/jive/thread.jspa?threadID=67837&tstart=0
>
> > This conversation piques my interest.. I have been reading a lot about
> Opensolaris/Solaris for the last few weeks.
> > Have even spoken to Sun storage techs about bringing in Thumper/Thor for
> our storage needs.
> > I have recently brought online a Dell server with a DAS (14 SCSI drives).
> This will be part of my tests now,
> > physically removing a member of the pool before issuing the removal
> command for that particular drive.
> > One other issue I have now also, how do you physically locate a
> failing/failed drive in ZFS?
> >
> > With hardware RAID sets, if the RAID controller itself detects the error,
> it will inititate a BLINK command to that
> > drive, so the individual drive is now flashing red/amber/whatever on the
> RAID enclosure.
> > How would this be possible with ZFS? Say you have a JBOD enclosure, (14,
> hell maybe 48 drives).
> > Knowing c0d0xx failed is no longer helpful, if only ZFS catches an error.
> Will you be able to isolate the drive
> > quickly, to replace it? Or will you be going "does the enclosure start at
> logical zero... left to right.. hrmmm"
> > Thanks
> > --
> > Brent Jones
> > [EMAIL PROTECTED]
>
>
> ------------------------------
> Get Hotmail on your Mobile! Try it 
> Now!<http://clk.atdmt.com/UKM/go/101719965/direct/01/>
>



-- 
Brent Jones
[EMAIL PROTECTED]

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Can I trust ZFS?

Reply via email to