[EMAIL PROTECTED] (Tim Kientzle) wrote in message 
news:<b128ta$b0n$[EMAIL PROTECTED]>...
> Personally, I think there's a lot of merit to _trying_

There's even more merit in only pretending to try... 

Here's the results of a quick simulation of a cache using random replacement. 
I've also included a scheme for a caching algorithm that might solve the 
problem.

I have simulated multiple sequential reads of a file. There are two 
parameters, the ratio of the file size to the cache size and the number of 
times the file is read.

For each combination of paramters, the simulation is averaged over 50 runs.

The size of the cache was 100 blocks, if you have a faster machine than mine, 
you can turn that up but I'm pretty sure it'll make very little difference.

Since we are talking about files being read more than once, the first reading 
of the file (which will give only cache misses) is not included in the 
statistics.

What does each column mean?

ratio: how big the file is compared to the cache size

lock: the cache hit % for a cache that just locks the first x blocks of the 
file into the cache and doesn't try to cache anything else once the cache is 
full.

1, 2, 3, 4, 5: the cache hit % for the random replacement cache with 1, 2, 3, 
4, 5 full reads of the file (not counting the initial read to warm up the 
cache)

l:r: how much better the locking cache is compared to the random cache

ratio   lock    1       2       3       4       5       l:r
1.0     100     100     100     100     100     100     100
1.1      90      84      83      83      82      82     107
1.2      83      72      69      69      69      69     115
1.3      76      61      59      58      58      58     124
1.4      71      52      51      50      49      49     136
1.5      66      45      43      42      42      42     146
1.6      62      39      38      36      36      36     158
1.7      58      33      32      31      31      31     173
1.8      55      29      28      27      27      27     187
1.9      52      26      24      24      24      23     198
2.0      50      22      21      20      20      20     219
2.1      47      19      19      18      18      18     238
2.2      45      17      16      16      16      15     253
2.3      43      15      15      14      14      14     280
2.4      41      13      12      12      12      12     298
2.5      40      11      11      11      10      10     336
2.6      38      10       9      10       9       9     352
2.7      37       9       8       8       8       8     394
2.8      35       8       8       8       7       7     434
2.9      34       7       6       6       6       6     472
3.0      33       6       6       6       6       6     477
3.1      32       5       5       5       5       5     556
3.2      31       5       5       5       4       4     577
3.3      30       4       4       4       4       4     645
3.4      29       4       3       3       3       3     700
3.5      28       3       3       3       3       3     714
3.6      27       3       3       3       3       3     869
3.7      27       3       2       2       2       2     886
3.8      26       2       2       2       2       2     948
3.9      25       2       2       2       2       2     1035
4.0      25       2       2       2       2       2     1094

As you can see, the locking cache is always better than the random one and the 
file doesn't have to be very big for it to be hugely better.

Maybe this is well known already or maybe it is not compatible with the 
current system but here's an idea for how you might implement the locking 
cache.

When a program hints that a file will be read in this way, the cached data for 
that file is treated differently. Blocks to be uncached are selected in the 
usual LRU way.

Say block A is selected to be evicted but block A is part of one of these 
special files, so instead of evicting block A, we look for another block in 
the file, call it block B. Block B should be the block furthest from the head 
of the file. We evict block B and we give block B's timestamp to block A.

To see this in action, imagine that blocks 1 to 20 of a file are in the cache. 
The LRU cache decides to evict block 1, we see that it's part of a special 
file so instead we evict block 20 and give block 20's timestamp to block 1. 
Soon after, the LRU cache decides to evict block 2 but we evict block 19 
instead and give its timestamp to block 2. Next our program reads block 21 
which is then cached. Soon after that the cache tries to evict block 3 but 
instead we evict block 21 and give it's timestamp to block 3.

Taking the file as a whole, the cache neither prefers nor penalises it 
compared to the current algorithm. However within the file, the blocks near 
the head are given preference over the ones near the tail.

If after 1 run through the file the cache has only kept 40% of the file. Under 
the current scheme this would be the final 40% of the file which is no use 
for the next run. Under the new scheme it's got the initial 40% which is as 
good as we could hope for.

There is the downside. The cache now has to keep an ordered list of the blocks 
contained in these special files owever there is no extra overhead for other 
files.

One other possible problem is that the blocks at the head of the file will 
tend to have the newer timestamps and so when the file is read a second time 
the newer timstamps will be overwritten before the older ones. It might be 
better to manage timestamps as a FIFO queue but on second thought, I think 
that will give this file an unfair advantage.

It should be straight forward to extend this to groups of files by keeping a 
single ordered list of all the blocks in several files.

The Perl for the benchmark is available at

http://www.fergaldaly.com/computer/randcache/bench

F

-- 
Do you need someone with lots of Unix sysadmin and/or lots of OO software 
development experience? Go on, giz a job.
My CV - http://www.fergaldaly.com/cv.html


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Reply via email to