On Wed, Aug 15, 2007 at 10:30:19AM -0700, Marc Perkel wrote: > --- Kyle Moffett <[EMAIL PROTECTED]> wrote: > > Except they do, and without directories the > > performance of your average filesystem is going to suck. > > Actually you would get a speed improvement. You hash > the full name and get the file number. You don't have > to break up the name into sections except for > evaluating name permissions. > > The important concept here is that files and name > aren't stored by levels of directories. The name > points to the file number. Directory levels are > emulated based on name separation characters or any > other algorithm that you want to use. > > One could create a file system and permission system > that gets rid of the concept of directories entirely > if one chooses to.
I would like to add support for Kyle's assertion. The model described by Marc is exactly the method used by the current version of the NCAR Mass Storage Service (MSS), which is data archive of 4+ petabytes contained in 40+ million files. To the user's point of view, it looks somewhat like a POSIX file system with both some extensions and deficiencies. The MSS was designed in the mid-1980s, in an era where the costs of the supercomputers (Cray-1s at that time) were paramount. This lead to some MSS design decisions to minimize the need for users to rerun jobs on the expensive supercomputer just because they messed up their MSS file creation statements. Files names are a maximum of 128 bytes, with a dynamically managed directory structure indicated by '/' characters in the name. The file name is hashed, and the hash table provides the internal file number (the address in the Master File Directory (MFD)). Any parent directories are created automatically by the system upon file creation, and are automatically deleted if empty upon file deletion. Directories also have a self pointer, and both files and directories are chained together to allow the user to list (or otherwise manipulate) the contents of a directory. The biggest problem with this model is that to manipulate the a directory itself, you have to simulate the operation on all of the files contained within it. For example to rename a directory with 'n' descendants, you must perform: n+1 hash table removals n+1 hash table insertions (with collision detection) n+1 MFD record updates 1 directory chain removal 1 directory chain insertion This is, needless to say, very painful when n is large. Since users must use directory trees to efficiently manage their data holdings, efficient directory manipulation is essential. Contrast this with the number of operations required for a directory rename if files do not record their complete pathname: 1 directory chain removal 1 directory chain insertion Fortunately we are currently working to change from using a model like Marc describes to one Kyle describes. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/