[EMAIL PROTECTED] (Tim Kientzle) wrote in message news:<b128ta$b0n$[EMAIL PROTECTED]>... > Personally, I think there's a lot of merit to _trying_
There's even more merit in only pretending to try... Here's the results of a quick simulation of a cache using random replacement. I've also included a scheme for a caching algorithm that might solve the problem. I have simulated multiple sequential reads of a file. There are two parameters, the ratio of the file size to the cache size and the number of times the file is read. For each combination of paramters, the simulation is averaged over 50 runs. The size of the cache was 100 blocks, if you have a faster machine than mine, you can turn that up but I'm pretty sure it'll make very little difference. Since we are talking about files being read more than once, the first reading of the file (which will give only cache misses) is not included in the statistics. What does each column mean? ratio: how big the file is compared to the cache size lock: the cache hit % for a cache that just locks the first x blocks of the file into the cache and doesn't try to cache anything else once the cache is full. 1, 2, 3, 4, 5: the cache hit % for the random replacement cache with 1, 2, 3, 4, 5 full reads of the file (not counting the initial read to warm up the cache) l:r: how much better the locking cache is compared to the random cache ratio lock 1 2 3 4 5 l:r 1.0 100 100 100 100 100 100 100 1.1 90 84 83 83 82 82 107 1.2 83 72 69 69 69 69 115 1.3 76 61 59 58 58 58 124 1.4 71 52 51 50 49 49 136 1.5 66 45 43 42 42 42 146 1.6 62 39 38 36 36 36 158 1.7 58 33 32 31 31 31 173 1.8 55 29 28 27 27 27 187 1.9 52 26 24 24 24 23 198 2.0 50 22 21 20 20 20 219 2.1 47 19 19 18 18 18 238 2.2 45 17 16 16 16 15 253 2.3 43 15 15 14 14 14 280 2.4 41 13 12 12 12 12 298 2.5 40 11 11 11 10 10 336 2.6 38 10 9 10 9 9 352 2.7 37 9 8 8 8 8 394 2.8 35 8 8 8 7 7 434 2.9 34 7 6 6 6 6 472 3.0 33 6 6 6 6 6 477 3.1 32 5 5 5 5 5 556 3.2 31 5 5 5 4 4 577 3.3 30 4 4 4 4 4 645 3.4 29 4 3 3 3 3 700 3.5 28 3 3 3 3 3 714 3.6 27 3 3 3 3 3 869 3.7 27 3 2 2 2 2 886 3.8 26 2 2 2 2 2 948 3.9 25 2 2 2 2 2 1035 4.0 25 2 2 2 2 2 1094 As you can see, the locking cache is always better than the random one and the file doesn't have to be very big for it to be hugely better. Maybe this is well known already or maybe it is not compatible with the current system but here's an idea for how you might implement the locking cache. When a program hints that a file will be read in this way, the cached data for that file is treated differently. Blocks to be uncached are selected in the usual LRU way. Say block A is selected to be evicted but block A is part of one of these special files, so instead of evicting block A, we look for another block in the file, call it block B. Block B should be the block furthest from the head of the file. We evict block B and we give block B's timestamp to block A. To see this in action, imagine that blocks 1 to 20 of a file are in the cache. The LRU cache decides to evict block 1, we see that it's part of a special file so instead we evict block 20 and give block 20's timestamp to block 1. Soon after, the LRU cache decides to evict block 2 but we evict block 19 instead and give its timestamp to block 2. Next our program reads block 21 which is then cached. Soon after that the cache tries to evict block 3 but instead we evict block 21 and give it's timestamp to block 3. Taking the file as a whole, the cache neither prefers nor penalises it compared to the current algorithm. However within the file, the blocks near the head are given preference over the ones near the tail. If after 1 run through the file the cache has only kept 40% of the file. Under the current scheme this would be the final 40% of the file which is no use for the next run. Under the new scheme it's got the initial 40% which is as good as we could hope for. There is the downside. The cache now has to keep an ordered list of the blocks contained in these special files owever there is no extra overhead for other files. One other possible problem is that the blocks at the head of the file will tend to have the newer timestamps and so when the file is read a second time the newer timstamps will be overwritten before the older ones. It might be better to manage timestamps as a FIFO queue but on second thought, I think that will give this file an unfair advantage. It should be straight forward to extend this to groups of files by keeping a single ordered list of all the blocks in several files. The Perl for the benchmark is available at http://www.fergaldaly.com/computer/randcache/bench F -- Do you need someone with lots of Unix sysadmin and/or lots of OO software development experience? Go on, giz a job. My CV - http://www.fergaldaly.com/cv.html To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message