> guess the upshot is that if one were to daily rsync data to an zfs
> filesystem, the changes wrought there by rsync would be reflected in
> zfs snapshots, maybe timed to happen right after the rsync runs, as
> these new blocks covering only the deltas... I don't really know what
> deltas are... but I guess it would be only the changed parts.

I do this (roughly) for Linux backups.  My ZFS server exports a "backup"
dataset via NFS to a Linux machine.  Twice a day (4am and 4pm) Linux
rsyncs to the NFS mountpoint.  Once a day (at midnight) the ZFS server
snapshots the dataset.

> And I'm guessing further that one would be able to recover each change
> from the snapshots somehow.

Yes.  My ZFS backup dataset has snapdir=hidden, but it's still available
over the NFS mount.  My Linux users can do this kind of thing:
        cd /nfs/backup/.zfs/snapshot/auto-d20090312
        more somefile

to read "somefile" from the 12 March 2009 backup.

> In my OP, I mentioned rsync and rsnapshot backup system on linux as
> being in some way comparable.  I do understand how rsnapshot works but
> still not seeing exactly how the zfs snapshots work.
> 
> Maybe a concrete example would be a bit easier to understand if you
> can give one.  I''m still not really understanding COW.

Copy on write means that two objects (files) referring to identical data
get pointers to the data instead of duplicate copies.  As long as these
are only read, and not written, the pointer to the same data is fine.
When a write occurs, the data is copied and one of the referrers gets a
pointer to the new copy.  This prevents the write from affecting both
referring files.

Copy on write is a description of how COW is used in virtual memory.
For disk storage, "copy" isn't necessarily accurate: since the entire
data block is rewritten anyway, a separate "copy" step can be optimized
away.

Here's a simple illustration of COW in action.  It's not necessarily
an accurate depiction of ZFS, but of the general concept in terms of a
filesystem.

  1. When a file (file A) is written to disk, blocks are allocated for
     the file and data is stored in those blocks.  The blocks each have
     a reference count, and ref counts are set to 1 because only one
     file refers to the blocks.

  2. I copy File A to File B.  The new file simply refers to all the
     same blocks.  The ref counts are raised to 2.

  3. I snapshot the filesystem.  This is essentially like copying every
     file in it, as in #2.  No blocks are copied because no new data was
     written, but ref counts are raised.

     I'm not sure about zfs's implementation, but in principle I guess
     an immutable snapshot should only need to raise ref ct by 1 in
     total, whereas a mutable snapshot (i.e., a clone) would incrememnt
     once for every reference in the filesystem.

  4. I rsync to the file in step #1.  Let's suppose this leaves blocks
     1 and 2 alone, but updates block 3.  The new data for block 3 is
     written to a new block (call it 3bis), and block 3 is left on the
     disk as it is.  Block 3's ref count is decremented, and 3bis's ref
     count is set to 1.

     File A: blocks 1, 2, 3bis
     File B: blocks 1, 2, 3
     Block 1: ref ct 3 (file A, file B, snapshot)
     Block 2: ref ct 3 (file A, file B, snapshot)
     Block 3: ref ct 2 (file B, snapshot)
     Block 3bis: ref ct 1 (file A)

  5. I remove file B.  Ref counts for its blocks are decremented, but
     since all its blocks still have ref counts > 0, they persist.  No
     blocks are removed from the dataset.

     File A: blocks 1, 2, 3bis
     Block 1: ref ct 2 (file A, snapshot)
     Block 2: ref ct 2 (file A, snapshot)
     Block 3: ref ct 1 (snapshot)
     Block 3bis: ref ct 1 (file A)

  6. I remove file A.  Ref counts again decrement.

     Block 1: ref ct 1 (snapshot)
     Block 2: ref ct 1 (snapshot)
     Block 3: ref ct 1 (snapshot)
     Block 3bis: ref ct 0

     Since 3bis no longer has any referrers, it is deallocated.  Blocks
     1, 2, and 3 are still used by the snapshot, even though the original
     files A and B are no longer present.

This is a pretty simplistic view.  In practice, not only does the COW
methodology apply to the files' data blocks; it also applies to their
metadata, the filesystem's directories, and so on.  This ensures that
directory information as well as files persist in snapshots.  It also
explains why snapshots are virtually instantaneous: you only make a new
set of pointers to all the existing data, but you don't replace any of
the existing data.

> So if I wanted to find a specific change in a file... that would be
> somewhere in the zfs snapthosts... say to retrieve a certain
> formulation in some kind of `rc' file that worked better than a later
> formulation. How would I do that?

Using the .zfs/snapshot directory (see above) you can diff two different
generations of a file at the same path.

-- 
 -D.    d...@uchicago.edu    NSIT    University of Chicago
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to