Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

Miles Nordin Fri, 02 Apr 2010 11:57:42 -0700

>>>>> "enh" == Edward Ned Harvey <solar...@nedharvey.com> writes:


   enh> If you have zpool less than version 19 (when ability to remove
   enh> log device was introduced) and you have a non-mirrored log
   enh> device that failed, you had better treat the situation as an
   enh> emergency.

Ed the log device removal support is only good for adding a slog to
try it out, then changing your mind and removing the slog (which was
not possible before).  It doesn't change the reliability situation one
bit: pools with dead slogs are not importable.  There've been threads
on this for a while.  It's well-discussed because it's an example of
IMHO broken process of ``obviously a critical requirement but not
technically part of the original RFE which is already late,'' as well
as a dangerous pitfall for ZFS admins.  I imagine the process works
well in other cases to keep stuff granular enough that it can be
prioritized effectively, but in this case it's made the slog feature
significantly incomplete for a couple years and put many production
systems in a precarious spot, and the whole mess was predicted before
the slog feature was integrated.

     >> The on-disk log (slog or otherwise), if I understand right, can
     >> actually make the filesystem recover to a crash-INconsistent
     >> state 

   enh> You're speaking the opposite of common sense.  

Yeah, I'm doing it on purpose to suggest that just guessing how you
feel things ought to work based on vague notions of economy isn't a
good idea.

   enh> If disabling the ZIL makes the system faster *and* less prone
   enh> to data corruption, please explain why we don't all disable
   enh> the ZIL?

I said complying with fsync can make the system recover to a state not
equal to one you might have hypothetically snapshotted in a moment
leading up to the crash.  Elsewhere I might've said disabling the ZIL
does not make the system more prone to data corruption, *iff* you are
not an NFS server.

If you are, disabling the ZIL can lead to lost writes if an NFS server
reboots and an NFS client does not, which can definitely cause
app-level data corruption.

Disabling the ZIL breaks the D requirement of ACID databases which
might screw up apps that replicate, or keep databases on several
separate servers in sync, and it might lead to lost mail on an MTA,
but because unlike non-COW filesystems it costs nothing extra for ZFS
to preserve write ordering even without fsync(), AIUI you will not get
corrupted application-level data by disabling the ZIL.  you just get
missing data that the app has a right to expect should be there.  The
dire warnings written by kernel developers in the wikis of ``don't
EVER disable the ZIL'' are totally ridiculous and inappropriate IMO.
I think they probably just worked really hard to write the ZIL piece
of ZFS, and don't want people telling their brilliant code to fuckoff
just because it makes things a little slower.  so we get all this
``enterprise'' snobbery and so on.

``crash consistent'' is a technical term not a common-sense term, and
I may have used it incorrectly:

 
http://oraclestorageguy.typepad.com/oraclestorageguy/2007/07/why-emc-technol.html

With a system that loses power on which fsync() had been in use, the
files getting fsync()'ed will probably recover to more recent versions
than the rest of the files, which means the recovered state achieved
by yanking the cord couldn't have been emulated by cloning a snapshot
and not actually having lost power.  However, the app calling fsync()
will expect this, so it's not supposed to lead to application-level
inconsistency.  

If you test your app's recovery ability in just that way, by cloning
snapshots of filesystems on which the app is actively writing and then
seeing if the app can recover the clone, then you're unfortunately not
testing the app quite hard enough if fsync() is involved, so yeah I
guess disabling the ZIL might in theory make incorrectly-written apps
less prone to data corruption.  Likewise, no testing of the app on a
ZFS will be aggressive enough to make the app powerfail-proof on a
non-COW POSIX system because ZFS keeps more ordering than the API
actually guarantees to the app.

I'm repeating myself though.  I wish you'll just read my posts with at
least paragraph granularity instead of just picking out individual
sentences and discarding everything that seems too complicated or too
awkwardly stated.

I'm basing this all on the ``common sense'' that to do otherwise,
fsync() would have to completely ignore its filedescriptor
argument. It'd have to copy the entire in-memory ZIL to the slog and
behave the same as 'lockfs -fa', which I think would perform too badly
compared to non-ZFS filesystems' fsync()s, and would lead to emphatic
performance advice like ``segregate files that get lots of fsync()s
into separate ZFS datasets from files that get high write bandwidth,''
and we don't have advice like that in the blogs/lists/wikis which
makes me think it's not beneficial (the benefit would be dramatic if
it were!) and that fsync() works the way I think it does.  It's a
slightly more convoluted type of ``common sense'' than yours, but mine
could still be wrong.

pgp2NZ32siAGy.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

Reply via email to