Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

Edward Ned Harvey Thu, 01 Apr 2010 04:43:01 -0700

> >If you have an ungraceful shutdown in the middle of writing stuff,
> while the
> >ZIL is disabled, then you have corrupt data.  Could be files that are
> >partially written.  Could be wrong permissions or attributes on files.
> >Could be missing files or directories.  Or some other problem.
> >
> >Some changes from the last 1 second of operation before crash might be
> >written, while some changes from the last 4 seconds might be still
> >unwritten.  This is data corruption, which could be worse than losing
> a few
> >minutes of changes.  At least, if you rollback, you know the data is
> >consistent, and you know what you lost.  You won't continue having
> more
> >losses afterward caused by inconsistent data on disk.
> 
> How exactly is this different from "rolling back to some other point of
> time?".
> 
> I think you don't quite understand how ZFS works; all operations are
> grouped in transaction groups; all the transactions in a particular
> group
> are commit in one operation.  I don't know what partial ordering ZFS


Dude, don't be so arrogant.  Acting like you know what I'm talking about
better than I do.  Face it that you have something to learn here.

Yes, all the transactions in a transaction group are either committed
entirely to disk, or not at all.  But they're not necessarily committed to
disk in the same order that the user level applications requested.  Meaning:
If I have an application that writes to disk in "sync" mode intentionally
... perhaps because my internal file format consistency would be corrupt if
I wrote out-of-order ... If the sysadmin has disabled ZIL, my "sync" write
will not block, and I will happily issue more write operations.  As long as
the OS remains operational, no problem.  The OS keeps the filesystem
consistent in RAM, and correctly manages all the open file handles.  But if
the OS dies for some reason, some of my later writes may have been committed
to disk while some of my earlier writes could be lost, which were still
being buffered in system RAM for a later transaction group.

This is particularly likely to happen, if my application issues a very small
sync write, followed by a larger async write, followed by a very small sync
write, and so on.  Then the OS will buffer my small sync writes and attempt
to aggregate them into a larger sequential block for the sake of accelerated
performance.  The end result is:  My larger async writes are sometimes
committed to disk before my small sync writes.  But the only reason I would
ever know or care about that would be if the ZIL were disabled, and the OS
crashed.  Afterward, my file has internal inconsistency.

Perfect examples of applications behaving this way would be databases and
virtual machines.


> Why do you think that a "Snapshot" has a "better quality" than the last
> snapshot available?

If you rollback to a snapshot from several minutes ago, you can rest assured
all the transaction groups that belonged to that snapshot have been
committed.  So although you're losing the most recent few minutes of data,
you can rest assured you haven't got file corruption in any of the existing
files.

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

Reply via email to