> It's not that making the cache bigger is inefficient, it's that the cache is > not used the way you are thinking.
Ok, I think I've got it now. The missing piece of the puzzle was the existence of the Linux buffer cache. So that's what the effective_buffer_cache value is for(!) I read Shridhar Daithankar and Josh Berkus's 'Tuning Postgresql for performance' document, and some things stike me (though only with respect to my RAM much bigger than database scenario.) I think I'm a bit more depressed than I was earlier today! 1. Postgresql is a two tiered cache mechanism. The first tier - the postgresql shared buffer cache sits on the second, larger tier, the linux buffer cache. So bits of the same data end up being in memory...twice, and two cache mechanisms operate at the same time. (That's how I understand it). 2. Even if the linux buffer cache contains all the data required for an execution of a plan, there is still a load of memory copying to do between these two tiers. Though memory copying is faster than disk access, it is still an overhead, and isn't there the real problem of thrashing between these two tiers if the plan can't fit all the data into the top tier, even if the thrashing is restricted to the memory system? 3. The OS will implement a 'Least recently Used' cache replacement strategy on data in its cache. This isn't the optimal cache replacement strategy for the database. If postgresql is in charge of all its RAM resident data, it can make more intelligent decisions about which stuff isn't needed once used. This still leads me to think that the best thing to do in my 'much bigger RAM than database size' scenario would be for postgresql to allocate a shared buffer cache big enough for all the data + a bit. By doing this, there would be some performance enhancing gains, including: 1. If the planner *knew* (rather than guessed) that all the data was effectively 'there' in RAM, in the right place (i.e. its workspace), wouldn't it make choosing the optimal plan easier? (I see that effective_buffer_cache is a stab in that direction, but just because postgresql guesses the data will have been cached by linux, it doesn't mean it actually is - surely an important distinction.) 2. You'd avoid a whole layer of caching, along with the not necessarily aligned memory copies and other overheads that this introduces. Even in the optimal case where all the data needed does reside in RAM, it's not in the right bit of RAM. (I may have misunderstood this relationship between the shared buffer cache and the linux buffer cache - if no memory copying actually occurs - then I'll go away for a bit!) Two interesting things I dug up today: www.linuxjournal.com/article.php?sid=5482 (to do with a high performance DB living in an OS controlled environment) and http://www.ixora.com.au/tips/avoid_buffered_io.htm (to do with Oracle tuning) The way things are, I see that postgresql will 'gracefully degrade' in the very common scenario where the database shares a machine with limited resources and lots of processes, all of them hammering the machine, but in my 'optimal RAM optimised db server ' scenario, the scheme seems actually to reduce the potential for a blistering system. So given all of that, I can't help swinging back to my original question about whether it's a bad thing to make the shared buffer cache huge, so long as you have much more RAM than the size you choose. It'll still require all that nasty Linux cache <->postgresql cache memory copying, but it will at least reduce postgresql's exposure to the 'not particularly clever' least recently used cache replacement strategy employed by the OS. Am I getting any closer to understanding the way things are? Thanks for your tolerance, Andy p.s. (It would be great if you could just load the data directly into the shared buffer cache with direct IO, since that OS file buffer->shared buffer cache transfer is just a waste of effort in this scenario), and direct IO allows spectacular IO gains when you're working with certain raid configurations connected with a suitable fat wire.) Ok - I'll shutup now ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match