On Fri, Feb 13, 2009 at 10:29:05AM -0800, Frank Cusack wrote: > On February 13, 2009 1:10:55 PM -0500 Miles Nordin <car...@ivy.net> wrote: > >>>>>>"fc" == Frank Cusack <fcus...@fcusack.com> writes: > > > > fc> If you're misordering writes > > fc> isn't that a completely different problem? > > > >no. ignoring the flush cache command causes writes to be misordered. > > oh. can you supply a reference or if you have the time, some more > explanation? (or can someone else confirm this.)
Ordering matters for atomic operations, and filesystems are full of those. Now, if ordering is broken but the writes all eventually hit the disk then no one will notice. But if power failures and/or partitions (cables get pulled, network partitions occur affecting an iSCSI connection, ...) then bad things happen. For ZFS the easiest way to ameliorate this is the txg fallback fix that Jeff Bonwick has said is now a priority. And if ZFS guarantees no block re-use until N txgs pass after a block is freed, then the fallback can be of up to N txgs, which gives you a decent chance that you'll recover your pool in the face of buggy devices, but for each discarded txg you lose that transaction's writes, you lose data incrementally. (The larger N is the better your chance that the oldest of the last N txg's writes will all hit the disk in spite of the disk's lousy cache behaviors.) The next question is how to do the fallback, UI-wise. Should it ever be automatic? A pool option for that would be nice (I'd use it on all-USB pools). If/when not automatic, how should the user/admin be informed of the failure to open the pool and the option to fallback on an older txg (with data loss)? (For non-removable pools imported at boot time the answer is that the service will fail, causing sulogin to be invoked so you can fix the problem on console. For removable pools there should be a GUI.) Nico -- _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss