[...] RC> The idea is that the database vendor knows their data storage RC> better than the OS can guess it, and that knowledge allows RC> them to implement better caching algorithms than the OS can RC> use. The fact that benchmark results show that raw partition RC> access is slower indicates that the databases aren't written RC> as well as they are supposed to be.
I am not convinced that this conclusion is warranted, though I admit I have not seen those benchmarks. The DB vendor's raw disk driver might be doing things like synchronous writes for maintaining its own invariants, while a [non-journalling] file system will care about fs meta-data consistency at best. While it is possible that the general purpose file system with more man-hours behind it is better written, the benchmarks might be omitting crucial criteria like crash protection and such. Do you guys have references to benchmarking data? RC> ... One of RC> which was someone who did tests with IBM's HPFS386 file system RC> for server versions of OS/2. He tried using 2M of cache with RC> HPFS386 and 16M of physical cache in a caching hard drive RC> controller and using 18M of HPFS386 cache with no cache on the RC> controller. The results were surprisingly close on real-world RC> tests such as compiling large projects. It seemed that 2M of RC> cache was enough to cache directory entries and other RC> file-system meta-data and cache apart from that worked on a RC> LRU basis anyway. This I would buy, as you point out the controller and the FS code are doing the same thing (if they are giving the same write guarantees). BM