On Thursday 30 January 2003 05:22 pm, Matthew Dillon wrote:
|     Well, here's a counterpoint.  Lets say you have an FTP
|     server with 1G of ram full of, say, pirated CDs at 600MB a
|     pop.
|
|     Now lets say someone puts up a new madonna CD and suddenly
|     you have thousands of people from all over the world trying
|     to download a single 600MB file.
|
|     Lets try another one.  Lets say you have an FTP server with
|     1G of ram full of hundreds of MPEG encoded pirated CDs at
|     50MB a pop and you have thousands of people from all over the
|     world trying to download a core set of 25 CDs, which exceeds
|     the available ram you have to cache all of them.
|
|     What I'm trying to illustrate here is the impossibility of
|     what you are asking.  Your idea of 'sequential' access cache
|     restriction only works if there is just one process doing the
|     accessing.  But if you have 25 processes accessing 25 different
| files sequentially it doesn't work, and how is the system supposed to
| detect the difference between 25 processes accessing 25 50MB files on
| a 1G machine (which doesn't fit in the cache) verses 300 processes
| accessing 15 50MB files on a 1G machine (which does fit). 
| Furthermore, how do you differentiate between 30 processes all
| downloading the same 600MB CD verses 30 processes downloading two
| different 600MB CD's, on a machine with 1G of cache?
|
|     You can't.  That's the problem.  There is no magic number between
|     0 and the amount of memory you have where you can say "I am going
|     to stop caching this sequential file" that covers even the more
|     common situations that come up.  There is no algorithm that can
|     detect the above situations before the fact or on the fly.  You
|     can analyize the situation after the fact, but by then it is too
| late, and the situation may change from minute to minute.  One minute
| you have 300 people trying to download one CD, the next minute you
| have 20 people trying to download 10 different CD's.
|
|                                               -Matt


You are absolutely right, and thank you for fullfilling my request to 
pillory me for that bit:

| :Of course the trick here is waving my hands and saying "assume that
| : you know how the file will be accessed in the future."  You ought
| : to pillory me for *that* bit.  Even with hinting there are problems
| : with this whole idea.  Still with some hinting the algorithm could
| : probably be a little more clever.
| :
| :(Actually, Terry Lambert *did* pillory me for that bit, just a bit,
| : when he pointed out the impossibility of knowing whether the file
| : is being used in the same way by other processes.)

Though actually that was awful gentle for a pillory.

All that said, there are *always* pathological cases for any algorithm; 
it is my intuition that somehow giving cache retention "priority" to 
processes that randomly access files would be likely to be an overall 
win.  Not that I'm planning to put my money where my mouth is and write 
some code to test this contention.

If you wanted to get *really* fancy you could add some code that tracked 
hits for given blocks over time and increased priority for blocks that 
were brought in often "recently" -- sort of an extended LRU list that 
wouldn't map a whole block in but would tell you what you'd tossed out 
lately and you could give priorty to blocks that were out of the "list 
cache" (or whatever we would call this secondary list of recently-used 
blocks).

Um, in re-reading that paragraph it seems clear as mud . . .

Whether any of these schemes would be of any practical benefit in any 
common situations is highly dubious; they wuld all increase overhead 
for one thing.  I think that the definitive answer, really, is the one 
you gave earlier:  If you have a specialized application or server 
which you know will make intense use of the file system in a 
predictable way, then you should customize the way it accesses the 
files because no general-purpose algorithm can ever be optimal for all 
cases.



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Reply via email to