> guess the upshot is that if one were to daily rsync data to an zfs > filesystem, the changes wrought there by rsync would be reflected in > zfs snapshots, maybe timed to happen right after the rsync runs, as > these new blocks covering only the deltas... I don't really know what > deltas are... but I guess it would be only the changed parts.
I do this (roughly) for Linux backups. My ZFS server exports a "backup" dataset via NFS to a Linux machine. Twice a day (4am and 4pm) Linux rsyncs to the NFS mountpoint. Once a day (at midnight) the ZFS server snapshots the dataset. > And I'm guessing further that one would be able to recover each change > from the snapshots somehow. Yes. My ZFS backup dataset has snapdir=hidden, but it's still available over the NFS mount. My Linux users can do this kind of thing: cd /nfs/backup/.zfs/snapshot/auto-d20090312 more somefile to read "somefile" from the 12 March 2009 backup. > In my OP, I mentioned rsync and rsnapshot backup system on linux as > being in some way comparable. I do understand how rsnapshot works but > still not seeing exactly how the zfs snapshots work. > > Maybe a concrete example would be a bit easier to understand if you > can give one. I''m still not really understanding COW. Copy on write means that two objects (files) referring to identical data get pointers to the data instead of duplicate copies. As long as these are only read, and not written, the pointer to the same data is fine. When a write occurs, the data is copied and one of the referrers gets a pointer to the new copy. This prevents the write from affecting both referring files. Copy on write is a description of how COW is used in virtual memory. For disk storage, "copy" isn't necessarily accurate: since the entire data block is rewritten anyway, a separate "copy" step can be optimized away. Here's a simple illustration of COW in action. It's not necessarily an accurate depiction of ZFS, but of the general concept in terms of a filesystem. 1. When a file (file A) is written to disk, blocks are allocated for the file and data is stored in those blocks. The blocks each have a reference count, and ref counts are set to 1 because only one file refers to the blocks. 2. I copy File A to File B. The new file simply refers to all the same blocks. The ref counts are raised to 2. 3. I snapshot the filesystem. This is essentially like copying every file in it, as in #2. No blocks are copied because no new data was written, but ref counts are raised. I'm not sure about zfs's implementation, but in principle I guess an immutable snapshot should only need to raise ref ct by 1 in total, whereas a mutable snapshot (i.e., a clone) would incrememnt once for every reference in the filesystem. 4. I rsync to the file in step #1. Let's suppose this leaves blocks 1 and 2 alone, but updates block 3. The new data for block 3 is written to a new block (call it 3bis), and block 3 is left on the disk as it is. Block 3's ref count is decremented, and 3bis's ref count is set to 1. File A: blocks 1, 2, 3bis File B: blocks 1, 2, 3 Block 1: ref ct 3 (file A, file B, snapshot) Block 2: ref ct 3 (file A, file B, snapshot) Block 3: ref ct 2 (file B, snapshot) Block 3bis: ref ct 1 (file A) 5. I remove file B. Ref counts for its blocks are decremented, but since all its blocks still have ref counts > 0, they persist. No blocks are removed from the dataset. File A: blocks 1, 2, 3bis Block 1: ref ct 2 (file A, snapshot) Block 2: ref ct 2 (file A, snapshot) Block 3: ref ct 1 (snapshot) Block 3bis: ref ct 1 (file A) 6. I remove file A. Ref counts again decrement. Block 1: ref ct 1 (snapshot) Block 2: ref ct 1 (snapshot) Block 3: ref ct 1 (snapshot) Block 3bis: ref ct 0 Since 3bis no longer has any referrers, it is deallocated. Blocks 1, 2, and 3 are still used by the snapshot, even though the original files A and B are no longer present. This is a pretty simplistic view. In practice, not only does the COW methodology apply to the files' data blocks; it also applies to their metadata, the filesystem's directories, and so on. This ensures that directory information as well as files persist in snapshots. It also explains why snapshots are virtually instantaneous: you only make a new set of pointers to all the existing data, but you don't replace any of the existing data. > So if I wanted to find a specific change in a file... that would be > somewhere in the zfs snapthosts... say to retrieve a certain > formulation in some kind of `rc' file that worked better than a later > formulation. How would I do that? Using the .zfs/snapshot directory (see above) you can diff two different generations of a file at the same path. -- -D. d...@uchicago.edu NSIT University of Chicago _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss