Scott Long wrote:
/me jumps up and down and waves his hands
The problem with journalling at the block layer is that you pretty much
become forced to journal metadata and data, since the block layer really
doesn't know the distinction, and definitely not in a
filesystem-independent way (yes, UFS does evil things to the buffer
cache by representing metadata with negative block numbers, but that is
just UFS). Full journalling has many drawbacks from the viewpoint of
speed and complexity, of course. So you really want to be able to do
just metadata journalling.
Another hard part of distinguishing between metadata and data is that
filesystems have a habit of migrating disk blocks from holding metadata
to holding data, and vice versa (think indirect pointer blocks, not
inode blocks). If you are only replaying metadata, you want to make
sure that you don't smash data blocks with old metadata.
Coming up with a filesystem independent way to represent all of this for
the block layer is not easy. Filesystems would have to be able to be
modified to provide proper metadata vs. data hints to the block layer.
And if you're going to do that, then why not just make it a library in
VFS, like what Darwin does?
The UFS Journalling work is already well underway, and I expect it to
follow the path of being a VFS library. Note that I'm saying 'library'
here, not 'layer'. There really is no way to make journalling work with
an arbitrary filesystem 'for free', whether as a VFS layer or a GEOM
transform, since journalling is 100% dependent on the filesystem working
with the buffer-cache to do sane operations in a defined in order.
An alternate SoC project that would be very useful is block-level
snapshots. I'm not sure if I'll be able to retain the filesystem
snapshot functionality in UFS with journalling enabled, so moving to
doing the snapshots in the block layer would be a good way to make up
for this. Beware that while the GEOM transform would be pretty
straight-forward to write, the real trick comes from being able to make
the consumer of a block device (a filesystem, maybe) flush itself to a
consistent state while the snapshot is being taken. The infrastructure
for this is the part that is very interesting, but also the most work.
Scott
Scott,
Have you looked at the journaling layer that Matt has been adding to
DragonflyBSD? What you are talking about appears very similar. Or am I
misunderstanding something?
Richard Coleman
[EMAIL PROTECTED]
_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"