On Tue, Apr 12, 2016 at 02:49:39PM +0100, nets...@avisoft.f9.co.uk wrote: > There was much discussion about a year ago about the cache performance on > RISC OS, and there were some code changes, but I would like to add the > results of some investigations of the Netsurf v3.4 cache on my Iyonix, > running RISC OS 5.23 (11 Oct 2015).
I am beginning to think this feature should never have been enabled for hoplessly legacy operating systems such as RISC OS. I will go over how this feature works once again. This is in response to this message but is, as usual, aimed at all users. To be clear a cache in any computer program trades one resource for another. Generally a web browser will have numerous caches for different uses. In NetSurf we have three main caches: 1. one held in RAM for decoded images This cache trades processor time (used to decode images from compressed source formats like jpegs) for memory (to hold the decoded images). Without this cache scrolling a page would be glacial as every time an image needs to be plotted, even if it is only a single pixel of it, we would need to decode the entire source image 2. a cache for source objects (the stuff downloaded from web servers) held in memory This cache trades memory for network bandwidth used downloading source objects. Without this every time a page navigation happens within a website all the css, images, javascript etc that did not chnage must be downloaded again which would quickly make browsing unusable. 3. a cache of source objects held on disc. This cache trades disc space and bandwidth for memory. Although not immediately obvious the memory is that from the previous cache and indirectly could be seen as network bandwidth used. This is known as a cache hierachy where one cache backs another. The memory cache size setting deals with the first two of these caches and the disc cache settings the third. > > In the past I had problems with the cache taking large amounts of disc > space, and the resulting long backup times for !Boot, so my current > settings are 10MB space, expiring after 2 days. The average web page is now well over 2 megabytes [1] and is growing rapidly all the time. You would be much better served having no persistant storage (disc) cache enabled at all by setting its size to zero than by a small one like this. As I keep emphasising again and again the cache is a trade of one resource for another in an attempt to reduce the oveall time to perform the action of visiting a web page. If you are not able to make that trade a net profitable transaction you are better off not doing it at all. The RAM, CPU and disc overhead for enabling the cache greatly exceeds your settings which probably require a minimum of a few hundred megabytes and several weeks to make the trade worthwile on RISC OS. In fact I will add a feature request to the tracker to have a minimum viable size for the cache size options. I fear your expectations around sizes of resources are a little out of date. The default cache sizes on PC platforms is 128 megabytes of memory and a gigabyte of disc. Even these are pretty restrained, for example: my desktop has a recently started copy of chrome with a handful of tabs open and thats reporting well over a gigabyte of memory used and several gigabytes of disc. It is not uncommon for standard PCs to have 8 gigabytes of memory and a terrabyte of hard drive space accessed at rates measured in hundreds of megabytes a second. I know RISC OS has no hope of getting anywhere near such resources but it must be understood that the modern web is orientated around systems of this magnitude of capability. [1] http://www.soasta.com/blog/page-bloat-average-web-page-2-mb/ > > However, the actual space usage was 45MB (as measured by Filer Count), > and it contained 210 files. What was more difficult to find was that > there were 7,298 directories with 8 levels, which occupied another 14MB, > of which 6,412 contained no files at any lower level. So only 886 > directories actually contained the 210 files of cached data. Enumeration > of the cache took about 2 minutes. > It is possible you had a cache left over from an earlier version of NetSurf where small files were stored in separate files. Cache improvements merge all smaller files into a few large index files and only use the directories for larger files. Regardless the directories are not accounted for as on most OS they are a very low cost resource and are never enumerated. There is a well known "cache" indicator file created which on most systems is used to indicate to other software that the contents of the directory are not at all "valuble" and may be discarded at will and should not be enumerated. > I decided to delete all 6,412 directories that contained no data, saving > about 12MB of disc space. More importantly, counting or enumerating all > the cache now takes about 7 seconds. There are still the same number of > files and cached bytes. you can always delete the entire cache (without netsurf running) at any time without having any impact at all except it will require all source files to be retrieved from the network. > > Netsurf itself still seems to work, but I have not noticed any change in > performance. > as already stated: with those settings any possible long term benefit you might gain is being lost to overhead as RISC OS disc system is generally poor. > So, some questions: > > - When are cached files deleted to meet the configured size & expiry? The cache is pruned only when adding a new entry which causes the overall cache usage to exceed the set level. at that point the least "valuble" objects are discarded untill the size drops below the desired size. This process is subject to 10% hysteresis to avoid excessive thrashing. No account is taken of the overheads like directory size or block sizes in the usage caclulations on the assumption they will be small compared to the cached data and overheads are computationaly expensive to determine for little gain. > - Are directories included in the space used? no > - Are directories ever deleted? If so, when? no > - Will deletion of empty directories cause any problems for Netsurf? no > > I have looked at the help ... but that says that files are not deleted by > Netsurf, and makes no mention of directories. It also refers to a > 'Perform maintenance' button which can be used to delete redundant files > ... but this is nowhere to be seen! The help and manual are out of date and refer to long since removed functionality. It might be useful to include a "purge" cache functionality for security reasons and perhaps ensure integrity when the cache size values are changed. again a feature requiest to cover this will be created. > > Martin > > > -- Regards Vincent http://www.kyllikki.org/