snv_91. I downloaded snv_94 today so I'll be testing with that tomorrow.
> Date: Mon, 28 Jul 2008 09:58:43 -0700> From: [EMAIL PROTECTED]> Subject: Re:
> [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed> To: [EMAIL
> PROTECTED]> > Which OS and revision?> -- richard> > > Ross wrote:> > Ok,
> after doing a lot more testing of this I've found it's not the Supermicro
> controller causing problems. It's purely ZFS, and it causes some major
> problems! I've even found one scenario that appears to cause huge data loss
> without any warning from ZFS - up to 30,000 files and 100MB of data missing
> after a reboot, with zfs reporting that the pool is OK.> >> >
> ***********************************************************************> > 1.
> Solaris handles USB and SATA hot plug fine> >> > If disks are not in use by
> ZFS, you can unplug USB or SATA devices, cfgadm will recognise the
> disconnection. USB devices are recognised automatically as you reconnect
> them, SATA devices need reconfiguring. Cfgadm even recognises the SATA device
> as an empty bay:> >> > # cfgadm> > Ap_Id Type Receptacle Occupant Condition>
> > sata1/7 sata-port empty unconfigured ok> > usb1/3 unknown empty
> unconfigured ok> >> > -- insert devices --> >> > # cfgadm> > Ap_Id Type
> Receptacle Occupant Condition> > sata1/7 disk connected unconfigured unknown>
> > usb1/3 usb-storage connected configured ok> >> > To bring the sata drive
> online it's just a case of running> > # cfgadm -c configure sata1/7 > >> >
> ***********************************************************************> > 2.
> If ZFS is using a hot plug device, disconnecting it will hang all ZFS status
> tools.> >> > While pools remain accessible, any attempt to run "zpool status"
> will hang. I don't know if there is any way to recover these tools once this
> happens. While this is a pretty big problem in itself, it also makes me worry
> if other types of error could have the same effect. I see potential for this
> leaving a server in a state whereby you know there are errors in a pool, but
> have no way of finding out what those errors might be without rebooting the
> server.> >> >
> ***********************************************************************> > 3.
> Once ZFS status tools are hung the computer will not shut down.> >> > The
> only way I've found to recover from this is to physically power down the
> server. The solaris shutdown process simply hangs.> >> >
> ***********************************************************************> > 4.
> While reading an offline disk causes errors, writing does not! > > *** CAUSES
> DATA LOSS ***> >> > This is a big one: ZFS can continue writing to an
> unavailable pool. It doesn't always generate errors (I've seen it copy over
> 100MB before erroring), and if not spotted, this *will* cause data loss after
> you reboot.> >> > I discovered this while testing how ZFS coped with the
> removal of a hot plug SATA drive. I knew that the ZFS admin tools were
> hanging, but that redundant pools remained available. I wanted to see whether
> it was just the ZFS admin tools that were failing, or whether ZFS was also
> failing to send appropriate error messages back to the OS.> >> > These are
> the tests I carried out:> >> > Zpool: Single drive zpool, consisting of one
> 250GB SATA drive in a hot plug bay.> > Test data: A folder tree containing
> 19,160 items. 71.1MB in total.> >> > TEST1: Opened File Browser, copied the
> test data to the pool. Half way through the copy I pulled the drive. THE COPY
> COMPLETED WITHOUT ERROR. Zpool list reports the pool as online, however zpool
> status hung as expected.> >> > Not quite believing the results, I rebooted
> and tried again.> >> > TEST2: Opened File Browser, copied the data to the
> pool. Pulled the drive half way through. The copy again finished without
> error. Checking the properties shows 19,160 files in the copy. ZFS list again
> shows the filesystem as ONLINE.> >> > Now I decided to see how many files I
> could copy before it errored. I started the copy again. File Browser managed
> a further 9,171 files before it stopped. That's nearly 30,000 files before
> any error was detected. Again, despite the copy having finally errored, zpool
> list shows the pool as online, even though zpool status hangs.> >> > I
> rebooted the server, and found that after the reboot my first copy contains
> just 10,952 items, and my second copy is completely missing. That's a loss of
> almost 20,000 files. Zpool status however reports NO ERRORS.> >> > For the
> third test I decided to see if these files are actually accessible before the
> reboot:> >> > TEST3: This time I pulled the drive *before* starting the copy.
> The copy started much slower this time and only got to 2,939 files before
> reporting an error. At this point I copied all the files that had been copied
> to another pool, and then rebooted.> >> > After the reboot, the folder in the
> test pool had disappeared completely, but the copy I took before rebooting
> was fine and contains 2,938 items, approximately 12MB of data. Again, zpool
> status reports no errors.> >> > Further tests revealed that reading the pool
> results in an error almost immediately. Writing to the pool appears very
> inconsistent.> >> > This is a huge problem. Data can be written without
> error, and is still served to users. It is only later on that the server will
> begin to issue errors, but at that point zfs admin tools are useless. The
> only possible recovery is a server reboot, but that will loose recent data
> written to the pool, but will do so without any warnings at all from ZFS. >
> >> > Needless to say I have a lot less faith in ZFS' error checking after
> having seen it loose 30,000 files without error.> >> >
> ***********************************************************************> > 5.
> If you are using CIFS and pull a drive from the volume, the whole server
> hangs!> >> > This appears to be the original problem I found. While ZFS
> doesn't handle drive removal well, the combination of ZFS and CIFS is worse.
> If you pull a drive from a ZFS pool (redundant or not), which is serving CIFS
> data, the entire server freezes until you re-insert the drive.> >> > Note
> that ZFS itself does not recover after the drive is inserted; admin tools
> will still hang. However the re-insertion of the drive is enough to unfreeze
> the server.> >> > Of course, you still need a physical reboot to get your ZFS
> admin tools back, but in the meantime data is accessible again.> > > > > >
> This message posted from opensolaris.org> >
> _______________________________________________> > zfs-discuss mailing list>
> > zfs-discuss@opensolaris.org> >
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss> > >
_________________________________________________________________
Find the best and worst places on the planet
http://clk.atdmt.com/UKM/go/101719807/direct/01/
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss