On Fri, Dec 9, 2011 at 4:33 PM, Brian Buhrow <buh...@lothlorien.nfbcal.org> wrote: > Hello. Just for your edification, it is possible to break out of fsck > mid-way and reinvoke it with fsck -y to get it to do the cleaning on its > own.
This whole discussion, interesting though it may be, may have occurred simply because of my unfamiliarity with NetBSD and probably a mistake in not looking at the fsck man page for something like the -y option when I reached the point where continuing to feed 'y's to fsck after the original crash seemed like a losing battle. Had I thought about -y (I know that fscks typically have such an option, but in my experience it's an optional answer to fsck questions, as OpenBSD's is; for whatever reason, I didn't think of it), I'd have used it, since I had nothing to lose at that point. But it's possible you have put your finger on the real truth of what happened here. Read on. You suggested trying the experiment I did with OpenBSD with NetBSD, and so I did. Twice. I installed NetBSD with separate directories for /, /usr, /var, /tmp, and /home, ala OpenBSD's default setup. All, except /home and /tmp were mounted softdef,noatime. /home was mounted async, and /tmp is an in-memory filesystem. The first time, I untarred the OpenBSD ports.tar.gz (I used it because it was what I used in the OpenBSD test, it's big, and I had it lying around) into a temporary directory in my home directory. With the battery removed from the laptop, I did an rm -rf ports and while that was happening, I pulled the power connector. On restart, fsck found a bunch of things it didn't like about the /home filesystem, but managed to fix things up to its satisfaction and declare the filesystem clean. My home directory survived this and, like OpenBSD, a fair amount of the ports directory was still present. I then removed it and re-did the untar, while the untar was happening, I again pulled the plug. This time, the automatic fsck got unhappy enough to drop me into single-user mode and ran fsck there manually. I again encountered a seemingly never-ending sequence of requests to fix this and that. So I aborted and used the -y option. It charged through a bunch of trouble spots and completed. On reboot, I found the same situation as the first one -- home directory intact and some of the ports directory present. I have a some thoughts about this: 1. Had I run fsck -y at the time of the first crash, I might well have found what I found today -- a repaired filesystem that was usable. So my assertion that the filesystem was lost may well have simply been my lack of skill as a NetBSD sys-admin. 2. Today's experiment shows that a NetBSD ffs filesystem mounted async, together with its fsck, *is* capable of surviving even a pretty brutal improper shutdown -- loss of power while a lot of writing was happening. Obviously I still don't have enough data to know if the probability of survival is comparable to Linux ext2, but what I found today is at least encouraging. I did one more experiment, and that was to untar the ports tarball, and then waited about a minute. I then did a sync. The disk light blinked just for a brief moment. This is a *big* tar file, but it appears from this easy little test that there was not a huge amount of dirty stuff sitting in the buffer cache. This is obviously not definitive, but does suggest that NetBSD is migrating stuff from the buffer cache back to the disk for async-mounted filesystems in timely fashion. A look at the code is probably the final arbiter here. I also note that there are sysctl items, such as vfs.sync.metadelay that I would like to understand. /Don Allen