>>>>> "jd" == Jim Dunham <james.dun...@sun.com> writes:

    jd> It is my understanding that the ZFS intent log (ZIL) satisfies
    jd> POSIX requirements for synchronous transactions, thus
    jd> filesystem consistency.

maybe ``file consistency'' would be clearer.  When you say filesystem
consistency people imagine their pools won't import, which I think
isn't what you're talking about.  Databases rely on the ZIL to keep
their data files internally consistent, and MTA's to keep their queue
directories consistent: ``file consistency'' meaning the insides of a
file must be consistent with the rest of the insides of the same file,
and they won't be without the ZIL.

so, for example, in an imaginary better world where virtual machine
software didn't break all kinds of sync and barrier rules and the ZIL
were the only issue, then disabling the ZIL on the Host could cause
the filesystems of virtual Guests to become inconsistent and refuse to
import or need drastic fsck if the Host lost power, or in the
SNDR-replicated copy of the Host, but the Host filesystem and its
replica would always stay clean and mountable with or without the ZIL.

The ZIL is stored on the disk, never in RAM as your earlier message
suggested, so it should be replicated along with everything else,
shouldn't it?

unless you are using a slog and leave the slog outside replication,
but in that case it should be impossible to import the pool on the
secondary because importing with missing slogs doesn't work yet, so
I'm not sure what's happening to you.

Are you actually observing violation of POSIX consistency
``suggestions'' w.r.t. fsync() or O_DSYNC on the secondary?  

Or are you talking about close-to-open?  Files that you close(), wait
for the close to return, break replication, and the file does not
appear on the secondary?

What's breaking exactly?

    jd> A simple test I performed to verify this, was to append to a
    jd> ZFS file (no synchronous filesystem options being set) a
    jd> series of blocks with a block order pattern contained
    jd> within. At some random point in this process, I took a ZFS
    jd> snapshot, immediately dropped SNDR into logging mode. When
    jd> importing the ZFS storage pool on the SNDR remote host, I
    jd> could see the ZFS snapshot just taken, but neither the
    jd> snapshot version of the file, or the file itself contained all
    jd> of the data previously written to it.

that's a really good test!  so SNDR is good for testing, too, it seems.

I'm glad you've done it.  If we'd just listened to the several people
speculating, ``just take a snapshot, it ought to imply a lockfs'' we
could be having nasty surprises months from now.  I'm also not that
upset about the behavior, if it lets one take and destroy snapshots
really fast.  I could see the opposing argument that all snapshots
should commit to disk atomically, though, because you are saying the
snapshot _exists_ but doesn't have in it what it should---maybe in a
more ideal world snapshot should either disappear after reboot, or
else if it exists contain exactly what it logically should.

    jd> I then retested, but opened the file with O_DSYNC, and when
    jd> following the same test steps above, both the snapshot version
    jd> of the file, and the file itself contained all of the data
    jd> previously written to it.

AIUI, in this test some of the file data may be written to the ZIL.
In the former test, the ZIL would not be used at all.

but the ZIL is just a separate area on the disk that's faster to write
to, since with O_DSYNC or fsync() you would like to return to the
application in a hurry.  ZFS scribbles down the change as quickly as
possible in the ZIL on the disk, then rewrites it in a more organized
way later.

-- 
READ CAREFULLY. By reading this fortune, you agree, on behalf of your employer,
to release me from all obligations and waivers arising from any and all
NON-NEGOTIATED  agreements, licenses, terms-of-service, shrinkwrap, clickwrap,
browsewrap, confidentiality, non-disclosure, non-compete and acceptable use
policies ("BOGUS AGREEMENTS") that I have entered into with your employer, its
partners, licensors, agents and assigns, in perpetuity, without prejudice to my
ongoing rights and privileges. You further represent that you have the
authority to release me from any BOGUS AGREEMENTS on behalf of your employer.

Attachment: pgpsZsO8kcf9d.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to