Re: 'database filesystems' (was: backing up windows hosts to openbsd)

Brian Candler Mon, 08 Jan 2007 02:49:00 -0800

On Sun, Jan 07, 2007 at 01:11:57AM +0100, Joachim Schipper wrote:
> On Sat, Jan 06, 2007 at 11:37:32PM +0100, chefren wrote:
> > This problem has little to do with OpenBSD although I do hope with all 
> > "hate" that's in me that once in the future OpenBSD will be the first 
> > OS with a good database file system, that could solve the problem 
> > above (provided all programs will use it etcetera), if well designed 
> > the database managing program can provide proper backups on other 
> > disks itself.
> 
> Don't you mean something akin to Linux LVM's freeze feature, where you
> can use a 'frozen in time' version of the disk to make backups from?
> IIRC, the LVM stores the original blocks and reads those instead of the
> modified ones when used in this way. The programs can continue to run as
> always, and dump can do its thing.


ISTM the problem is deeper than that.

Consider a database application - mysql or oracle or whatever. At some point
in time, it decides to write some updates to tables 1, 2 and 3, and index
files X, Y and Z.

It cannot request all these changes atomically, so it performs a series of
separate writes to the filesystem (these could be writes to separate files,
or a series of writes to the same file at different offsets)

If you happen to take a snapshot of the filesystem at the point where some
of these writes have been requested but others have not, then the image you
restore will be in an inconsistent state.

So, is the OP proposing that the filesystem itself gain ACID transactions,
say with a two-stage commit? If so, then the problems I can see are:

(1) You won't see any benefit until *all* applications have been rewritten
to use these new semantics instead of traditional ones. That means new
versions of oracle, mysql etc.

(2) The filesystem itself will be able to report new types of errors, such
as conflicting transactions, which applications will have to be able to
handle properly.

(3) Depending on what level of transaction isolation you select, performance
of concurrent applications may be much worse.

(4) The filesystem will still need to store this data as on-disk blocks, and
at snapshot time you'll still need to ensure that only whole transactions
are backed up.

So arguably you're just pushing the problem down a layer - although I admit
it would be easier to have a single backup operation at the filesystem layer
(quiesce; snapshot; release) than to have to do this for each of the running
applications.

Leaving applications such as databases to manage their own ACID requirements
does at least let them be tuned to their own specific needs, and lets them
run on a much wider range of platforms.

Maybe there's a partial solution, where you have a journalling layer on the
filesystem, and applications can opt to be aware of it (e.g. requesting a
series of writes as being part of the same journal-level transaction). But
this still requires changes to the applications, and I can see lots of
potential pitfalls.

> Of course, Bad Things Happen once you run out of space to store those
> original blocks on... (nothing too bad, but I do believe the 'frozen'
> volume is destroyed on the spot).

Filesystem snapshots are pretty old technology - NetApp have had them for
years. You reserve a maximum percentage of disk space for snapshots. If you
run out of snapshot space, then you just get a normal 'disk full' error, and
you can delete old or unwanted snapshots to free up space, or else alter the
snapshot percentage.

> Or did you mean something else entirely?

That's what I'm wondering.

Regards,

Brian.

Re: 'database filesystems' (was: backing up windows hosts to openbsd)

Reply via email to