On Fri, Feb 13, 2009 at 8:24 PM, Bob Friesenhahn <bfrie...@simple.dallas.tx.us> wrote: > On Fri, 13 Feb 2009, Ross Smith wrote: >> >> You have to consider that even with improperly working hardware, ZFS >> has been checksumming data, so if that hardware has been working for >> any length of time, you *know* that the data on it is good. > > You only know this if the data has previously been read. > > Assume that the device temporarily stops pysically writing, but otherwise > responds normally to ZFS. Then the device starts writing again (including a > recent uberblock), but with a large gap in the writes. Then the system > loses power, or crashes. What happens then?
Hey Bob, Thinking about this a bit more, you've given me an idea: Would it be worth ZFS occasionally reading previous uberblocks from the pool, just to check they are there and working ok? I wonder if you could do this after a few uberblocks have been written. It would seem to be a good way of catching devices that aren't writing correctly early on, as well as a way of guaranteeing that previous uberblocks are available to roll back to should a write go wrong. I wonder what the upper limits for this kind of write failure is going to be. I've seen 30 second delays mentioned in this thread. How often are uberblocks written? Is there any guarantee that we'll always have more than 30 seconds worth of uberblocks on a drive? Should ZFS be set so that it keeps either a given number of uberblocks, or 5 minutes worth of uberblocks, whichever is the larger? Ross _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss