Re: [zfs-discuss] zfs streams & data corruption

Toby Thain Wed, 25 Feb 2009 08:22:47 -0800


On 25-Feb-09, at 9:53 AM, Moore, Joe wrote:

Miles Nordin wrote:
that SQLite2 should be equally as tolerant of snapshot backupsas it
  is of cord-yanking.

The special backup features of databases including ``performing a
checkpoint'' or whatever, are for systems incapable of snapshots,
which is most of them.  Snapshots are not writeable, so this ``in the
middle of a write'' stuff just does not happen.
This is correct. The general term for these sorts of point-in-timebackups is "crash consistant". If the database can be recoveredeasily (and/or automatically) from pulling the plug (or a kill -9),then a snapshot is an instant backup of that database.
In-flight transactions (ones that have not been committed) at thedatabase level are rolled back. Applications using the databasewill be confused by this in a recovery scenario, since thetransaction was reported as committed are gone when the databasecomes back. But that's the case any time a database moves"backward" in time.
Of course Toby rightly pointed out this claim does not apply if you
take a host snapshot of a virtual disk, inside which a database is
running on the VM guest---that implicates several pieces of
untrustworthy stacked software. But for snapshotting SQLite2 toclone
the currently-running machine I think the claim does apply, no?
Snapshots of a virtual disk are also crash-consistant. If the VMhas not committed its transactionally-committed data and is stillholding it volatile memory, that VM is not maintaining its ACIDrequirements, and that's a bug in either the database or in the OSrunning on the VM.

Or the virtual machine! I hate to dredge up the recent thread again -but if your virtual machine is not maintaining guest barriersemantics (write ordering) on the underlying host, then your snapshotmay contain inconsistencies entirely unexpected to the virtualisedtransactional/journaled database or filesystem.[1]

I believe this can be reproduced by simply running VirtualBox withdefault settings (ignore flush), though I have been too busy latelyto run tests which could prove this. (Maybe others would beinterested in testing as well.) I infer this explanation fromconsistency failures in InnoDB and ext3fs that I have seen[2], whichwould not be expected on bare metal in pull-plug tests. My point isnot about VB specifically, but just that in general, the consistencyissue - already complex on bare metal - is tangled further as thesoftware stack gets deeper.


--Toby

[1] - The SQLite web site has a good summary of related issues.
http://sqlite.org/atomiccommit.html
[2] http://forums.virtualbox.org/viewtopic.php?t=13661

The snapshot represents the disk state as if the VM were instantlygone. If the VM or the database can't recover from pulling thevirtual plug, the snapshot can't help that.
That said, it is a good idea to quiesce the software stack as muchas possible to make the recovery from the crash-consistant image aspainless as possible. For example, if you take a snapshot of a VMrunning on an EXT2 filesystem (or unlogged UFS for that matter) therecovery will require an fsck of that filesystem to ensure that thefilesystem structure is consistant. Perforing a "lockfs" on thefilesystem while the snapshot is taken could mitigate that, butthat's still out of the scope of the ZFS snapshot.
--Joe

--Joe
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs streams & data corruption

Reply via email to