On Fri, Feb 13, 2009 at 8:24 PM, Bob Friesenhahn
<bfrie...@simple.dallas.tx.us> wrote:
> On Fri, 13 Feb 2009, Ross Smith wrote:
>>
>> You have to consider that even with improperly working hardware, ZFS
>> has been checksumming data, so if that hardware has been working for
>> any length of time, you *know* that the data on it is good.
>
> You only know this if the data has previously been read.
>
> Assume that the device temporarily stops pysically writing, but otherwise
> responds normally to ZFS.  Then the device starts writing again (including a
> recent uberblock), but with a large gap in the writes.  Then the system
> loses power, or crashes.  What happens then?

Hey Bob,

Thinking about this a bit more, you've given me an idea:  Would it be
worth ZFS occasionally reading previous uberblocks from the pool, just
to check they are there and working ok?

I wonder if you could do this after a few uberblocks have been
written.  It would seem to be a good way of catching devices that
aren't writing correctly early on, as well as a way of guaranteeing
that previous uberblocks are available to roll back to should a write
go wrong.

I wonder what the upper limits for this kind of write failure is going
to be.  I've seen 30 second delays mentioned in this thread.  How
often are uberblocks written?  Is there any guarantee that we'll
always have more than 30 seconds worth of uberblocks on a drive?
Should ZFS be set so that it keeps either a given number of
uberblocks, or 5 minutes worth of uberblocks, whichever is the larger?

Ross
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to