No, I count that as "doesn't return data ok", but my post wasn't very clear at all on that.
Even for a write, the disk will return something to indicate that the action has completed, so that can also be covered by just those two scenarios, and right now ZFS can lock the whole pool up if it's waiting for that response. My idea is simply to allow the pool to continue operation while waiting for the drive to fault, even if that's a faulty write. It just means that the rest of the operations (reads and writes) can keep working for the minute (or three) it takes for FMA and the rest of the chain to flag a device as faulty. For write operations, the data can be safely committed to the rest of the pool, with just the outstanding writes for the drive left waiting. Then as soon as the device is faulted, the hot spare can kick in, and the outstanding writes quickly written to the spare. For single parity, or non redundant volumes there's some benefit in this. For dual parity pools there's a massive benefit as your pool stays available, and your data is still well protected. Ross On Tue, Nov 25, 2008 at 10:44 AM, <[EMAIL PROTECTED]> wrote: > > >>My justification for this is that it seems to me that you can split >>disk behavior into two states: >>- returns data ok >>- doesn't return data ok > > > I think you're missing "won't write". > > There's clearly a difference between "get data from a different copy" > which you can fix but retrying data to a different part of the redundant > data and writing data: the data which can't be written must be kept > until the drive is faulted. > > > Casper > > _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss