I recently installed NetBSD 5.1 on an old Thinkpad T41 that I use for experimentation. I installed it with a single, monolithic filesystem, which I mounted async,noatime. Yes, I'm fully aware that's dangerous and was aware of it at the time. But .... I have a long history of running Linux systems with ext2 filesystems and now, journal-less ext4 filesystems, and in all the years of running those systems, where no particular care is taken to write file-system meta-data in ordered fashion, I have never lost a file-system. Linux crashes are extremely rare, my systems are either laptops or on UPSes, and I never do something as stupid as just whacking the power-button to shut them down. On the rare occasions when a file-system has suffered an improper shutdown, fsck has always been able to recover with little or no damage. (I should perhaps mention that I'm retired now, having had a long career in software development, with a lot of OS development experience -- IBM CP/67, Tenex, TOPS20, Unix (Mach), and a LOT of Linux sys-admin experience; less with the BSDs, but not zero).
The T41 has built-in Aironet Wireless Communications MPI350 wireless hardware. The GENERIC 5.1 kernel did not see this device at boot time, so no wireless. To fix this, I stuck an Atheros-based PCMCIA card in the machine, which did work. I was attempting to build Gnucash via pkgsrc on the T41 and had left the machine grinding away overnight (webkit is one of Gnucash's dependencies, and it's huge). It had finished the build when I got up the following morning and I installed gnucash and then did a bunch of cleaning-up in /usr/pkgsrc. I then tried to use firefox and found that my network connection was dead. So I did a /etc/rc.d/network restart and the system froze, completely dead. Upon restart, the automatic fsck gave up and requested a manual fsck. I tried that, but there are just too many things broken, a consequence, I'm sure, of running async and having this crash occur just after having done a lot of filesystem writing. The situation was so bad, I had to abandon this install. There are two issues here: 1. It looks like there's a bug in the Atheros driver. 2. I'm a little bit surprised that the filesystem was as much of a mess as it was. I mentioned all this to old friend Christos Zoulas and he suggested that I post this message. It is certainly true that I had done a lot of writing to the filesystem (as a result of my pkgsrc cleanup) and that had occurred within, say 10 minutes of the crash, maybe less. So it wasn't hours. But it also wasn't seconds. My Linux experience, and this is strictly gut feel -- I have no hard evidence to back this up -- tells me that if this had happened on a Linux system with an async, unjournaled filesystem, the filesystem would have survived. In suggesting that I post this, Christos mentioned that he's seen situations where a lot of writing happened in a session (e.g., a kernel build) and then the sync at shutdown time took a long time, which has made him somewhat suspicious that there might be a problem with the trickle sync that the kernel is supposed to be doing. So my purpose in posting this is to ask after doing 'make clean's of perhaps 15 or 20 packages and their dependencies, what is your estimate of the maximum time before everything gets safely written out of the buffer cache (this machine has a 1.6 Ghz Pentium M, 2 GB of memory, and a 7200 rpm 60 GB pata disk -- yes, not a normal configuration for a T41; I stuck the memory and disk in this machine taken from another, dead Thinkpad I have)? Is it seconds? Tens of seconds? Minutes? If it's small, then I would suggest that a kernel wizard have a look at the trickle sync stuff. I made the point to Christos that I'm probably one of a very small number, maybe one, who would mount the whole world async (and please, no lectures; I knew the risk going in; this was an experiment and I knew it could end badly; I did not have 10 years worth of un-backed-up financial data on this machine :-), and it is almost certainly true that if the filesystem had been mounted sync or softdep, it would have survived the crash. So if there's a problem with trickle sync, it would only have catastrophic consequences in the very rare case of someone doing what I did (mounting async, doing a lot of writing followed by a system crash). I'm trying to make the argument that there could be a problem that is benign in 99.99% of the NetBSD setups, and so you haven't heard about. /Don Allen