:Not necessarily. I suspect that there is :a strong tendency to access particular files :in particular ways. E.g., in your example of :a download server, those files are always :read sequentially. You can make similar assertions :about a lot of files: manpages, gzip files, :C source code files, etc, are "always" read :sequentially. : :If a file's access history were stored as a "hint" :associated with the file, then it would :be possible to make better up-front decisions about :how to allocate cache space. The ideal would be to
This has been tried. It works up to a point, but not to the extent that you want it to. The basic problem is that past history does not necessarily predict future behavior. With the web server example, different client loads will result in different access behaviors. They might still all be sequential, but the combinations of multiple users will change the behavior enough that you would not be able to use the history as a reliable metric to control the cache. There is also an issue of how to store the 'history'. It isn't a simple matter of storing when a block was last accessed. Analysis of the access history is just as important and a lot of the type of analysis we humans do is intuitive and just cannot be replicated by a computer. Basically it all devolves down into the case where if you know exactly how something is going to be accessed, or you need caching to work a certain way in order to guarentee a certain behavior, the foreknowledge you have of the access methodologies will allow you to cache the information manually far better then the system could cache it heuristically. :store such hints on disk (maybe as an extended :attribute?), but it might also be useful to cache :them in memory somewhere. That would allow the :cache-management code to make much earlier decisions :about how to handle a file. For example, if a process :started to read a 10GB file that has historically been :accessed sequentially, you could immediately decide :to enable read-ahead for performance, but also mark :those pages to be released as soon as they were read by the :process. : :FWIW, a web search for "randomized caching" yields :some interesting reading. Apparently, there are :a few randomized cache-management algorithms for :which the mathematics work out reasonably well, :despite Terry's protestations to the contrary. ;-) :I haven't yet found any papers describing experiences :with real implementations, though. : :If only I had the time to spend poring over FreeBSD's :cache-management code to see how these ideas might :actually be implemented... <sigh> : :Tim Kientzle It should be noted that was already implement most of the heuristics you talk about. We have a heuristic that detects sequential access patterns, for example, and enables clustered read-ahead. The problem isn't detection, the problem is scale. These heuristics work wonderfully at a small scale (i.e. lets read 64K ahead verses trying to cache 64MB worth of the file). Just knowing something is sequential does not allow you to choose how much memory you should set aside to cache that object, for example. Automatically depressing the priority of pages read sequentially after they've been used can have as terrible a performance impact as it can a positive one, depending on the size of the object, the number of distinct objects being accessed in that manner, perceived latency by end users, number of end users, the speed of their connections (some objects may be accessed more slowly then others depending on the client's network bandwidth), and so forth. -Matt Matthew Dillon <[EMAIL PROTECTED]> To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message