I have done a bit of testing, and so far so good really. I have a Dell 1800 with a Perc4e and a 14 drive Dell Powervault 220S. I have a RaidZ2 volume named 'tank' that spans 6 drives. I have made 1 drive available as a spare to ZFS.
Normal array: # zpool status pool: tank state: ONLINE scrub: scrub completed with 0 errors on Fri Aug 1 19:37:33 2008 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 c0t6d0 ONLINE 0 0 0 spares c0t13d0 AVAIL errors: No known data errors One drive removed: # zpool status pool: tank state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-D3 scrub: resilver completed with 0 errors on Fri Aug 1 20:30:39 2008 config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 raidz2 DEGRADED 0 0 0 c0t1d0 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 spare DEGRADED 0 0 0 c0t3d0 UNAVAIL 0 0 0 cannot open c0t13d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 c0t6d0 ONLINE 0 0 0 spares c0t13d0 INUSE currently in use errors: No known data errors Now lets remove the hot spare ;) # zpool status pool: tank state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-D3 scrub: resilver completed with 0 errors on Fri Aug 1 20:30:39 2008 config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 raidz2 DEGRADED 0 0 0 c0t1d0 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 spare UNAVAIL 0 656 0 insufficient replicas c0t3d0 UNAVAIL 0 0 0 cannot open c0t13d0 UNAVAIL 0 0 0 cannot open c0t4d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 c0t6d0 ONLINE 0 0 0 spares c0t13d0 INUSE currently in use errors: No known data errors Now, this Perc4e doesn't support JBOD, so each drive is a standalone Raid0 (how annoying). With that, I cannot plug the drives back in with the system running, controller will keep them offline until I enter the bios. But in my scenario, this does demonstrate ZFS tolerates hot removal of drives, without issuing a graceful removal of the device. I was copying MP3s to the volume the whole time, and the copy continued uninterrupted, without error. I verified all data was written as well. All data should be online when I reboot and put the pool back in normal state. I am very happy with the test. I don't know many hardware controllers that'll loose 3 drives out of an array of 6 (with spare), and still function normally (even if the controller supports Raid6, I've seen major issues where writes were not committed). I'll add my results to your forum thread as well. Regards Brent Jones [EMAIL PROTECTED] On Thu, Jul 31, 2008 at 11:56 PM, Ross Smith <[EMAIL PROTECTED]> wrote: > Hey Brent, > > On the Sun hardware like the Thumper you do get a nice bright blue "ready > to remove" led as soon as you issue the "cfgadm -c unconfigure xxx" > command. On other hardware it takes a little more care, I'm labelling our > drive bays up *very* carefully to ensure we always remove the right drive. > Stickers are your friend, mine will probably be labelled "sata1/0", > "sata1/1", "sata1/2", etc. > > I know Sun are working to improve the LED support, but I don't know whether > that support will ever be extended to 3rd party hardware: > http://blogs.sun.com/eschrock/entry/external_storage_enclosures_in_solaris > > I'd love to use Sun hardware for this, but while things like x2200 servers > are great value for money, Sun don't have anything even remotely competative > to a standard 3U server with 16 SATA bays. The x4240 is probably closest, > but is at least double the price. Even the J4200 arrays are more expensive > than this entire server. > > Ross > > PS. Once you've tested SCSI removal, could you add your results to my > thread, would love to hear how that went. > http://www.opensolaris.org/jive/thread.jspa?threadID=67837&tstart=0 > > > This conversation piques my interest.. I have been reading a lot about > Opensolaris/Solaris for the last few weeks. > > Have even spoken to Sun storage techs about bringing in Thumper/Thor for > our storage needs. > > I have recently brought online a Dell server with a DAS (14 SCSI drives). > This will be part of my tests now, > > physically removing a member of the pool before issuing the removal > command for that particular drive. > > One other issue I have now also, how do you physically locate a > failing/failed drive in ZFS? > > > > With hardware RAID sets, if the RAID controller itself detects the error, > it will inititate a BLINK command to that > > drive, so the individual drive is now flashing red/amber/whatever on the > RAID enclosure. > > How would this be possible with ZFS? Say you have a JBOD enclosure, (14, > hell maybe 48 drives). > > Knowing c0d0xx failed is no longer helpful, if only ZFS catches an error. > Will you be able to isolate the drive > > quickly, to replace it? Or will you be going "does the enclosure start at > logical zero... left to right.. hrmmm" > > Thanks > > -- > > Brent Jones > > [EMAIL PROTECTED] > > > ------------------------------ > Get Hotmail on your Mobile! Try it > Now!<http://clk.atdmt.com/UKM/go/101719965/direct/01/> > -- Brent Jones [EMAIL PROTECTED]
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss