I've been looking into the efficiency of the caching in dump(8) and it's abyssmal: After instrumenting dump(8) and looking at its access patterns, it turns out that it typically reads roughly three times as much data into its cache as it should, whilst using five times as much RAM as requested.
This poor efficiency is related to the way dump(8) works: A master process scans through the inodes to dump and a number of slave processes (three by default) actually read the disk blocks and write the data to tape. Additional processes are spawned for each tape written (to simplify checkpointing). If dump writes to a single tape, it will use a total of five processes. The current cache mechanism has a separate private cache associated with each process. Thus you typically have five caches (each of the requested size). The re-reading is partially caused by the distribution of read requests across the slave processes: Read requests for adjacent blocks of data are likely to be handled by different slave processes and therefore different caches. I've checked out the behaviour in the various *BSDs with the following results: DragonFly copied FreeBSD NetBSD uses a single shared cache OpenBSD doesn't support caching. I've tried modelling a unified cache along the NetBSD line and there appears to be a massive improvement in cache performance. It's unclear how much of an improvement this will give in overall performance but not physically reading data from disk must be faster than reading it. I believe it would be worthwhile creating a todo item to investigate this more thoroughly. -- Peter Jeremy
pgpK6PD6QiXC4.pgp
Description: PGP signature