A couple of things to note here. Well, many things actually. * Turning off write caching, assuming the drive even looks at the bit, will destroy write performance for any driver which does not support command queueing. So, for example, scsi typically has command queueing (as long as the underlying drive firmware actually implements it properly), 3Ware cards have it (underlying drives, if SATA, may not, but 3Ware's firmware itself might do the right thing).
The FreeBSD ATA driver does not, not even in AHCI mode. The RAID code does not as far as I can tell. You don't want to turn this off. * Filesystems like ZFS and HAMMER make no assumptions on write ordering to disk for completed write I/O vs future write I/O and use BIO_FLUSH to enforce ordering on-disk. These filesystems are able to queue up large numbers of parallel writes inbetween each BIO_FLUSH, so the flush operation has only a very small effect on actual performance. Numerous Linux filesystems also use the flush command and do not make assumptions on BIO-completion/future-BIO ordering. * UFS + softupdates assumes write ordering between completed BIO's and future BIOs. This doesn't hold true on a modern drive (with write caching turned on). Unfortunately it is ALSO not really the cause behind most of the inconsistency reports. UFS was *never* designed to deal with disk flushing. Softupdates was never designed with a BIO_FLUSH command in mind. They were designed for formally ordered I/O (bowrite) which fell out of favor about a decade ago and has since been removed from most operating systems. * Don't get stuck in a rut and blame DMA/Drive/firmware for all the troubles. It just doesn't happen often enough to even come close to being responsible for the number of bug reports. With some work UFS can be modified to do it, but performance will probably degrade considerably because the only way to do it is to hold the completed write BIOs (not biodone() them) until something gets stuck, or enough build up, then issue a BIO_FLUSH and, after it returns, finish completing the BIOs (call the biodone()) for the prior write I/Os. This will cause softupdates to work properly. Softupdates orders I/O's based on BIO completions. Another option would be to complete the BIOs but do major surgery on softupdates itself to mark the dependancies as waiting for a flush, then flush proactively and re-sync. Unfortunately, this will not solve the whole problem. IF THE DRIVE DOESN'T LOOSE POWER IT WILL FLUSH THE BIOs IT SAID WERE COMPLETED. In otherwords, unless you have an actual power failure the assumptions softupdates will hold. A kernel crash does NOT prevent the actual drive from flushing the IOs in its cache. The disk can wind up with unexpected softupdates inconsistencies on reboot anyway. Thus the source of most of the inconsistency reports will not be fixed by adding this feature. So more work is needed on top of that. -- Nearly ALL of the unexpected softupdates inconsistencies you see *ARE* for the case where the drive DOES in fact get all the BIO data it returned as completed onto the disk media. This has happened to me many, many times with UFS. I'm repeating this: Short of an actual power failure, any I/O's sent to and acknowledged by the drive are flushed to the media before the drive resets. A FreeBSD crash does not magically prevent the drive from flushing out its internal queues. This means that there are bugs in softupdates & the kernel which can result in unexpected inconsistencies on reboot. Nobody has ever life-tested softupdates to try to locate and fix the issues. Though I do occassionally see commits that try to fix various issues, they tend to be more for live-side non-crash cases then for crash cases. Some easy areas which can be worked on: * Don't flush the buffer cache on a crash. Some of you already do this for other reasons (it makes it more likely that you can get a crash dump). The kernel's flushing of the buffer cache is likely a cause of a good chunk of the inconsitency reports by fsck, because unless someone worked on the buffer flushing code it likely bypasses softupdates. I know when working on HAMMER I had to add a bioop explicitly to allow the kernel flush-buffers-on-crash code to query whether it was actually ok to flush a dirty buffer or not. Until I did that DragonFly was flushing HAMMER buffers which on crash which it had absolutely no business flushing. * Implement active dependancy flushing in softupdates. Instead of having it just adjust the dependancies for later flushes softupdates needs to actively initiate I/O for the dependancies as they are resolved. To do this will require implementing a flush queue, you can't just recurse (you will blow out the kernel stack). If you dont do this then you have to sync about a dozen times, with short delays between each sync, to ensure that all the dependancies are flushed. The only time this is done automatically is during a nominal umount during shutdown. * Once the above two are fixed start testing within virtual environments by virtually pulling the plug, and virtually crashing the kernel. Then fscking to determine if an unexpected softupdates inconsistency occured. There are probably numerous cases that remain. Of course, what you guys decide to do with your background fsck is up to you, but it seems to be a thorn in the side of FreeBSD from the day it was introduced, along with snapshots. I very explicitly avoided porting both the background fsck and softupdates snapshot code to DFly due to their lack of stability. The simple fact of the matter is that UFS just does not recover well on a large disk. Anything over 30-40 million inodes and you risk not being able to fsck the drive at all, not even in 64-bit mode (you will run out of swap). You get one inconsistency and the filesystem is broken forever. Anything over 200GB and your background fsck can wind up taking hours, seriously degrading the performance of the system in the process. It can take 6 hours to fsck a full 1TB HD. It can take over a day to fsck larger setups. Putting in a few sleeps here and there just makes the run time even longer and perpetuates the pain. My recommendation? Default UFS back to a synchronous fsck and stop treating ZFS (your only real alternative) as being so ultra-alpha that it shouldn't be used. Start recommending it for any filesystem larger then 200GB. Clean up the various UI issues that can lead to self immolation and foot stomping. Fix the defaults so they don't blow out kernel malloc areas, etc etc. Fix whatever bugs pop up. UFS is already unsuitable for 'common' 1TB consumer drives even WITH the background fsck. ZFS is ALREADY far safer to use then UFS for large disks, given reasonable constraints on feature selection. -Matt _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"