Re: OT Re: 'database filesystems'

Brian Candler Wed, 10 Jan 2007 06:59:30 -0800

On Wed, Jan 10, 2007 at 09:21:45AM +0900, Mathieu Sauve-Frankel wrote:
> Could you guys please take this completely useless discussion off-list ?
> It has absolutely zero value to anyone running or developing OpenBSD.


Well, maybe there is something useful that can be salvaged :-)

I think the issue here is the use of the word "database". Let's drop that
and look at the following problems:

(1) Keeping a second (remote) filesystem in sync with a first

That is, I make changes on disk1, and I want a remote copy on disk2 to
remain synchronised. I want this to happen in as near-real-time as possible,
but I don't want to lose local functionality if connectivity to the machine
hosting disk2 is unavailable for a while.

Solutions I know of in other environments:

- NetApp with filesystem snapshots and 'snap mirror'. Basically you snapshot
the filesystem, and copy the snapshot as a baseline. Then you take another
snapshot, and send the diffs between the first and the second as updates to
the remote site. Rinse and repeat.

(A similar solution might use a journalling filesystem and copy the logs
across to replicate changes, ideally coalescing the logs so that multiple
updates to the same block only take a single entry)

- FreeBSD geom with ggated/ggatec and gmirror, which is basically software
RAID across a network. However this only really works in a LAN environment;
whilst the master can continue to work if the slave is not reachable, when
the slave comes back up I believe a full resync will be required. Not good
for a 750GB drive :-(

- iSCSI drives with iSCSI initiator and software RAID. Suffers the same
problems as above.

So are there any solutions for this in OpenBSD, either now or in the
pipeline?

Note that in this example, the remote disk image *cannot* be mounted, not
even read-only, because any server which mounted the image would find blocks
changing underneath it without any notification. So it only serves as a cold
filesystem backup, although that in itself is valuable (IMO).

(2) Further to the above: some form of shared filesystem where the remote
copy can be safely mounted and used read-only

(3) Further to the above: some form of shared filesystem where the remote
copy can be mounted read-write and changes propagate both ways. This can
land you into problems when conflicting off-line updates are made by both
sides.

AFS and Coda are the only things I know of of which fall into those
categories, and I don't have any experience of using either. I do note the
following comment in /usr/src/usr.sbin/afs/src/README

| 6. What do I need to run arla?
| 
| If you have one of the systems listed above you will be able to mount
| afs as a file system (and probably to panic your kernel as well).

which doesn't instill much confidence.

But it seems to me that something like this is perhaps what the OP is
looking for: not a 'database' as such (possibly implying SQL and/or ACID
transactions), but filesystems with versioning and replication. These can be
transparent to applications, which continue to open(), read() and write()
files as usual(*). And personally I'd find it useful to hear about what
options are available in OpenBSD for this.

Regards,

Brian.

(*) If an application wants to get hold of an older version of a file, it
can find the snapshot mounted read-only under a different subdirectory - at
least, this is how Netapp deals with it. However this model does assume that
the entire filesystem is snapshotted at particular times, and doesn't give
you version control at the file level.

In any case, databases don't give you this option either; that is, there's
no SQL statement to say "show me the value that was in this column 6 hours
ago"

Re: OT Re: 'database filesystems'

Reply via email to