On Sat, Mar 11, 2006 at 08:25:08PM -0800, Elliott Mitchell wrote:
> But those are unlikely to move around, and hence unlikely to change major
> number often. Devices that do move around are likely to carry their
> journal with them, so having the hint contain only the minor number would
> be sufficient. This could even be handled through a hook in hotplug.

Nope.  Consider the use case where the data filesystem is using a SCSI
or a RAID disk, and the journal filesystem is using a battery-backed
up memory disk which looks, feels, and smells like an IDE disk.  In
that case, the major number of the data partition != to the major
number of the journal.  So as you can see, just storing the minor
number in the hint will not save you.

Or consider the case where you are using SCSI id #5 for the data disk,
but in order to get the faster performance, you have the external
journal on a separate spindle, which is SCSI id #6.  Now the system
administrator does a clean shutdown of the system, and remove SCSI id
#4.  *Poof* the SCSI minor device id's get renumbered, so what used to
be /dev/sdc1 and /dev/sdd1 now become /dev/sdb1 and /dev/sdc1, and any
hints based on major/minor device numbers will be invalidated.

If the system uses blkid to do mount-by-label, mount has no problem
finding the data disk on /dev/sdb1, but the hint in the external
journal is now incorrect.  Since the entire system was cleanly
shutdown, there is no reason why the system administrator needs to
force an fsck just to update the hint; that's just inelegant.  The
solution is that the mount program needs to be able to use the blkid
library to find the new location of the external journal as well.

> > Nah, it's too hard, especially when you consider what might happen
> > with iSCSI and Fibre Channel.  Searching for filesystems by UUID
> > really does belong in userspace.  But mount does need to know how to
> > specify the external journal to the filesystem, just as today it
> > passes block device for the filesystem itself to the kernel.
> 
> Strikes me as inelegant to not be able to directly call mount().  :-(

You can't directly call mount if you are (a) mounting an NFS
partition, without doing a lot of NFS-specific DNS name resolution,
etc., or (b) if you are doing any kind of mount-by-label or
mount-by-uuid.  And this is because putting DNS resolution into the
kernel, or doing find-block-device-by-UUID is insane.  This is another
example of needing to pass *all* of the parameters of the mount
command into the kernel, and the location of the external journal is
just one of the mount parameters, just as the IP address of the NFS
server or where to find the data partition is one of the mount
parameters.

The hint was a convenience to system administrator for simple cases,
but you can make the argument that we should have never implemented
the hint, since it left us in a position where we didn't have all of
the pieces (i.e., the journal_device mount option should have been
implemented a long time ago), and got people lazy and complacent about
finishing the userspace support for external journals.

                                                - Ted


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to