> > :I know it's not the answer, it's just related question: do you know > > :perhaps of any initiatives (except XFS) that could significantly shorten > > :time it takes fsck to check big filesystems, let's say 64GB? As it is now, > > :it's almost unbearable. I naively thought softupdates would (almost) > > :eliminate the need to do fsck... > > : > > :Andrzej Bialecki > > > > Eventually Kirk is planning for softupdates to allow you to run a special > > version of fsck in the background to clean up the block bitmap on a live > > filesystem. The time frame for this project is not known. > > > > Another possibility would be to mark individual cylinders clean/dirty > > to reduce the amount of work fsck must do on a normal filesystem. It > > would be a pretty hefty project for someone to take on, though. > > Hmm.. If I understand you correctly: > > * the ffs code would have to be modified to mark cylinder groups "dirty" > when there are writes to that CG. > > * on unmount, after the buffers are flushed they would be marked > clean. > > * on mount all "clean" flags in CGs would have to be ckecked (instead of > the single bit) > > * fsck would have to be modified to recognize CG "clean" flag and prune > only those CGs. > > Overall, doesn't sound _that_ complicated... but most probably I'm missing > something.
When a system crashes, the dirty bit is set. Because the dirty bit is set, you can't trust the FS contents to be able to distinguish between a crash that was the result of a software failure, and one that was the result of a hardware failure. Because of this, you must assume a hardware failure, and engage in a full check. In the case of a software failure, the cylinder group bitmaps may, in fact, have bits indicating that things which are not truly allocated have in fact been allocated. The process of traversing these (locking each CG as you do so) to clear the bits on things that were never truly allocated is the "fsck in the background" operation which is permissible following a software failure which leaves the dirty bit set for the FS. There are two rational methods for getting around this problem; the first was suggested by Ganger and Patt, Matt Day, Mark Muhlestein, myself, and others: "soft read-only". A "soft read-only" implementation was done (by Kirk) for BSDI. The basic idea is to mark the in core superblock read-only after there are no dirty buffer left associated with an FS, and then mark the on-disk structure clean. When a write (or a read, since you must obey POSIX atime semantics) occurs, you must mark the FS dirty and _be certain this write has been commited to disk_, before clearing the "soft read-only" flag and allowing the dirtying operation to complete. An implementation of this is pretty trivial on a normal system, and Matt, Mark, and myself implemented such a beast for our Windows 95 port of the Heidemann framework and the BSD FFS (and the Ganger/Patt Soft updates code). This gives you a sort of "statistical protection", which is most useful for a single user desktop box (e.g. Windows 95), where the box's disks are frequently idle for large stretches of time, and therefore in the state "clean, soft-read-only". For FreeBSD, the problem is complicated by the FS metadata's dirty buffers being hung off the device vnode, rather than being truly seperate data. This means that you must sync out that data, as well, before you can mark the FS clean (and you must resync out similar data to besure the dirty bit has been correctly set, before proceeding with other writes). For the Windows 95 port of the code, there was no unified VM and buffer cache to have to worry about in this regard. Apart from "soft read-only", you can obtain, at the cylinder group level, seperate "clean bits" on a per cylinder group basis. For this to work in the face of a true hardware failure, you must engage in a two stage commit process, in which you mark the entire FS dirty, modify the state of the cylinder group clean bit, and then mark the FS clean. This works in the face of software failures for cylinder group operations. To make it robust in the face of hardware failures, you must have a seperate "dirty-but-ok" bit for the cylinder group, which is similarly protected, and which is reset (after resetting the FS dirty bit, after resetting the CG dirty bit) during updates to non-CG bitmap data. Failure to support this leaves you unable to verify the state of the non-bitmap data in the CG bitmap, particularly for files whose block pointers span cylinder groups. Processing of cleanup is further complicated by the fact that any file that spans a "dirty, dirty" CG after a software failure must be treated as if it had been involved in a hardware failure. With a large number of files, the benefits gained by this approach are small. Aside: I was under the impresssion from the Usenix reports that Kirk's checkpointing mechanism was a reference to the ability to stall an image of the FS as an exposed "snapshot", to allow for backups to occur on running FS's (and if the backups were "taking too long", that regular soft-updates operations would eventaully stall as a result). Terry Lambert te...@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message