On Tue, 6 Sep 2005, Igor Shmukler wrote:

You are correct about the Unix file system organization, but does it
mean reliable vnode to fullname conversation is not possible?

Yes.  Get over it.

Well, I do not think it is a Yes. I very much think it is a No. You should have continued reading my email 'til the middle or even farther.

There are various tricks that can be played to increase the chances of finding a name in the name cache, but those tricks run out quickly on systems like NFS servers where files can be accessed without being looked up since the last boot, or with background fsck. This is a fundamental property of the UNIX file system design, and it while it offers some quite powerful capabilities, nothing changes the fact that names are fundamentally second class systems in the file system and VFS design.

The main tricks that can be played are:

- Don't purge intermediate but unused nodes from the name cache.  A
  specific design choice in FreeBSD has been to allow cache entries for
  unused nodes to be removes so that the nodes can be reused.  On systems
  that rapidly consume vnodes, this allows more vnodes to be recycled, so
  means more memory available.  However, it also means that it is less
  likely to be possible to reconstruct a name from the name cache.

- Maintain references to cache entries instead of vnodes when accessing
  leaf files.  This is actually somewhat the approach taken by Linux --
  typically the hardest name to "identify" is the last segment to reach a
  file, since files can have hard links (and directories typically don't).
  That name can rapidly be invalidated due to renaming, unlinking,
  linking, and so on, and hence can be quite stale, but if you assume the
  name space is static, this will help out with the "files don't have
  parents" problem.

- With a minor redesign of UFS, eliminating hard links, it is possible to
  add a directory back-pointer to the parent of a file.  In this case,
  there is an authoritative reference to the parent.  Mind you, this comes
  with many down-sides: Apple attempted to ship a UNIX system without
  support for hard links, and had to rapidly hack support for it back into
  the file system.

- Maintain a parent back-pointer for files in the vnode, reflecting the
  last directory used to reach the file, so that you can search that
  directory to find a possible name.  This requires different reference
  management behavior, prevents directories from falling out of the cache
  if a file reached via the directory is in use, and will also require
  walking directories, which can be very expensive.

At heart, though, fundamental issues remain: files can have no names, or they can be looked up using a name that is removed, yet still have another name. They can have several names. They can be accessed without any lookup. The same name can refer to several files due to mountpoint covering. Throughout the design, names are assumed to be only fleetingly valid (during the lookup), and of secondary importance after that.

Most systems I've looked at try to work around a lack of names in two ways:

(1) They treat the name as something valid only at time of lookup.  For
    example, the Solaris audit system captures a name used to look up a
    node, and after that it is the responsibility of the consumer of the
    audit trail to identify any name operations that might affect the name
    of an object in use, if names are important.  Typically they have to
    handle three names during lookup: path to process root, path from
    process root to cwd, and path from cwd to file.

(2) Apple has an underlying file system, HFS+, that actually maintains a
    fairly strong notion of directory hierarchy, via its catalog, so you
    can look up parent nodes.  They maintain a vnode backpointer from
    children to parents in VFS, set up during lookup.  However, this
    breaks for several reasons: volfs, which allows access to files by
    device + inode number, NFS, which allows access to files not by path,
    and their hacks to re-add hard links using a special directory, which
    can result in no sensible name being returned at all.  This is why if
    you look at Darwin/Mac OS X audit trails, you'll often just see lists
    of inode numbers and device numbers instead of names.

(3) They attempt to strengthen the name cache, either lowering the ability
    to recycle system memory for intermediate directories, or accepting
    more stale data.  Either way, the approaches fall down in the face of
    the fundamental design choice to deprioritize names: NFS, direct inode
    access, hard links, mount point grafting.

(4) Maintain parallel data structures, such as used by HADB, to construct
    "directory trees", and fall back on expensive disk searching
    algorithms to handle edge cases, rename, NFS access, and so on.


Robert N M Watson
_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to