On Mon, Jan 28, 2013 at 5:40 PM, Theodore Ts'o <ty...@mit.edu> wrote:
> On Mon, Jan 28, 2013 at 04:20:11PM -0800, Darrick J. Wong wrote:
>> On Mon, Jan 28, 2013 at 03:27:38PM -0800, David Lang wrote:
>> > The situation I'm thinking of is when dealing with VMs, you make a
>> > filesystem image once and clone it multiple times. Won't that end up
>> > with the same UUID in the superblock?
>>
>> Yes, but one ought to be able to change the UUID a la tune2fs -U.  Even
>> still... so long as the VM images have a different UUID than the fs that they
>> live on, it ought to be fine.
>
> ... and this is something most system administrators should be
> familiar with.  For example, it's one of those things that Norton
> Ghost when makes file system image copes (the equivalent of "tune2fs
> -U random /dev/XXX")

Hmm, maybe I missed something but it does not seem like a good idea
to use the volume UID itself to generate unique-per-volume metadata
hashes, if users expect to be able to change it. All the metadata hashes
would need to be changed.

Anyway, our primary line of attack on this problem is not unique hashes,
but actually knowing which blocks are in files and which are not. Before
(a hypothetical) Tux3 fsck repair would be so bold as to reattach some lost
metadata to the place it thinks it belongs, all of the following would need
to be satisfied:

   * The lost metadata subtree is completely detached from the filesystem
     tree. In other words, it cannot possibly be the contents of some valid
     file already belonging to the filesystem. I believe this addresses the
     concern of David Lang at the head of this thread.

   * The filesystem tree is incomplete. Somwhere in it Tux3 fsck has
     discovered a hole that needs to be filled.

   * The lost metadata subree is complete and consistent, except for not
     being attached to the filesystem tree.

   * The lost metadata subtree that was found matches a hole where
     metadata is missing, according to its "uptags", which specify at
     least the low order bits of the inode the metadata belongs to and
     the offset at which it belongs.

   * Tux3 fsck asked the user if this lost metadata (describing it in some
     reasonable way) should be attached to some particular filesystem
     object that appears to be incomplete. Alternatively, the lost subtree
     may be attached to the traditional "lost+found" directory, though we
     are able to be somewhat more specific about where the subtree
     might originally have belonged, and can name the lost+found object
     accordingly.

Additionally, Tux3 fsck might consider the following:

  * If the allocation bitmaps appear to be undamaged, but some or all
    of a lost filesystem tree is marked as free space, then the subtree is
    most likely free space and no attempt should be made to attach it to
    anything.

Thanks for your comments. I look forward to further review as things progress.

One thing to consider: this all gets much more interesting when versioning
arrives. For shared tree snapshotting filesystem designs, this must get very
interesting indeed, to the point where even contemplating the corner makes
me shudder. But even with versioning, Tux3 still upholds the single-reference
rule, therefore our fsck problem will continue to look a lot more like Ext4 than
like Btrfs or ZFS. Which suggests some great opportunities for unabashed
imitation.

Regards,

Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to