From a reporting perspective, yes, zpool status should not hang, and should report an error if a drive goes away, or is in any way behaving badly. No arguments there. From the data integrity perspective, the only event zfs needs to know about is when a bad drive is replaced, such that a resilver is triggered. If a drive is suddenly gone, but it is only one component of a redundant set, your data should still be fine. Now, if enough drives go away to break the redundancy, that's a different story altogether.
Jon Ross Smith wrote: > I agree that device drivers should perform the bulk of the fault > monitoring, however I disagree that this absolves ZFS of any > responsibility for checking for errors. The primary goal of ZFS is to > be a filesystem and maintain data integrity, and that entails both > reading and writing data to the devices. It is no good having > checksumming when reading data if you are loosing huge amounts of data > when a disk fails. > > I'm not saying that ZFS should be monitoring disks and drivers to > ensure they are working, just that if ZFS attempts to write data and > doesn't get the response it's expecting, an error should be logged > against the device regardless of what the driver says. If ZFS is > really about end-to-end data integrity, then you do need to consider > the possibility of a faulty driver. Now I don't know what the root > cause of this error is, but I suspect it will be either a bad response > from the SATA driver, or something within ZFS that is not working > correctly. Either way however I believe ZFS should have caught this. > > It's similar to the iSCSI problem I posted a few months back where the > ZFS pool hangs for 3 minutes when a device is disconnected. There's > absolutely no need for the entire pool to hang when the other half of > the mirror is working fine. ZFS is often compared to hardware raid > controllers, but so far it's ability to handle problems is falling short. > > Ross > > > > Date: Wed, 30 Jul 2008 09:48:34 -0500 > > From: [EMAIL PROTECTED] > > To: [EMAIL PROTECTED] > > CC: zfs-discuss@opensolaris.org > > Subject: Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive > removed > > > > On Wed, 30 Jul 2008, Ross wrote: > > > > > > Imagine you had a raid-z array and pulled a drive as I'm doing here. > > > Because ZFS isn't aware of the removal it keeps writing to that > > > drive as if it's valid. That means ZFS still believes the array is > > > online when in fact it should be degrated. If any other drive now > > > fails, ZFS will consider the status degrated instead of faulted, and > > > will continue writing data. The problem is, ZFS is writing some of > > > that data to a drive which doesn't exist, meaning all that data will > > > be lost on reboot. > > > > While I do believe that device drivers. or the fault system, should > > notify ZFS when a device fails (and ZFS should appropriately react), I > > don't think that ZFS should be responsible for fault monitoring. ZFS > > is in a rather poor position for device fault monitoring, and if it > > attempts to do so then it will be slow and may misbehave in other > > ways. The software which communicates with the device (i.e. the > > device driver) is in the best position to monitor the device. > > > > The primary goal of ZFS is to be able to correctly read data which was > > successfully committed to disk. There are programming interfaces > > (e.g. fsync(), msync()) which may be used to ensure that data is > > committed to disk, and which should return an error if there is a > > problem. If you were performing your tests over an NFS mount then the > > results should be considerably different since NFS requests that its > > data be committed to disk. > > > > Bob > -- - _____/ _____/ / - Jonathan Loran - - - / / / IT Manager - - _____ / _____ / / Space Sciences Laboratory, UC Berkeley - / / / (510) 643-5146 [EMAIL PROTECTED] - ______/ ______/ ______/ AST:7731^29u18e3 _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss