2011/2/25 Robert Haas <robertmh...@gmail.com>: > 2011/2/25 Cédric Villemain <cedric.villemain.deb...@gmail.com>: >>> All that having been said, I think that while Josh is thinking fuzzily >>> about the mathematics of his proposal, the basic idea is pretty >>> sensible. It is not easy - likely not possible - for the system to >>> have a good idea which things will be in some kind of cache at the >>> time the query is executed; it could even change mid-query. The >>> execution of one part of the query could evict from the cache data >>> which some other part of the plan assumed would be cached. But DBAs >>> frequently have a very good idea of which stuff is in cache - they can >>> make observations over a period of time and then adjust settings and >>> then observe some more and adjust some more. >> >> I believe we can maintain a small map of area of a relation which are >> in the OS buffer cache (shared buffers move more), or at least a >> percentage of the relation in OS cache. Getting autovacuum daemon >> being able to update those maps/counters might be enought and easy to >> do, it is really near what auto-analyze do. My observation is that >> the percentage in cache is stable on a production workload after some >> tens of minutes needed to warm the server. > > I don't think we can assume that will be true in all workloads. > Imagine a server doing batch processing. People submit large batches > of work that take, say, an hour to complete. Not all batches use the > same set of tables - maybe they even run in different databases. > After a big batch process finishes crunching numbers in database A, > very little of database B will be cached. But it's not necessarily > right to assume that when we start queries for a new batch in database > B, although it's more likely to be right for large tables (which will > take a long time to get cached meaningfully, if they ever do) than > small ones. Also, it could lead to strange issues where batches run > much faster or slower depending on which batch immediately proceeded > them. If we're going to do something a lot of times, it'd be better > to bite the bullet and read it all in rather than going to more work > elsewhere, but if we're only going to touch it once, then not so much. > > You might also have this issue on systems that run OLTP workloads all > day and then do some batch processing at night to get ready for the > next business day. Kevin Grittner wrote previously about those jobs > needing some different settings in his environment (I'm not > remembering which settings at the moment). Suppose that the batch > process is going to issue a query that can be planned in one of two > possible ways. One way involves reading 10% of a relation, and the > other way involves reading the whole thing. The first plan takes 200 > s to execute if the relation is not cached, and 180 s if the relevant > portion is cached. The second plan takes 300 s to execute if the > relation is not cached, and 100 s if it is cached. At the start of > the batch run, the relation won't be cached, because it's used *only* > by the overnight job and not by the daily OLTP traffic. Which way > should we execute the query? > > The answer is that if the batch job only needs to execute that query > *once*, we should do it the first way. But if it needs to execute it > three or more times, the second way is better, but only if we use the > second plan every time. If we start out with the first plan, we're > always better off sticking with it *unless* we know that we're going > to repeat the query at least twice more after the iteration we're > currently planning. To make the right decision, the query planner > needs a crystal ball. Or, a little help from the DBA.
Yes, we are talking of improving some part of the model. Some workloads are dramatic and need special customization. This is true. Still there is a path of improvement, and probably it will remain a path of improvement after the current model is updated. I am not proposing something to solve all the issues, but way more interesting IMHO than just letting the dba say : 'this table is in cache at XX%'. Btw, pgfincore already do solve the usecase you provide by helping the DBA to prepare its batch processing, so in some sense I am familiar with what you describe (take the second plan, pgfincore will preload in the background, and your query will be done in 100s from the first). > >> What should really help here is to have hooks in the cost functions to >> test those ideas without the need to patch postgresql-core a lot. Will >> it be ok to have hooks or will it add to much CPU consumption in a >> sensible part of the code ? > > Depends on where you put them, I guess. Hooks are pretty cheap, but > they're also pretty hard to use. Yes, it will be easier to make an extension, have people testing it and validate or not the 'new' model -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers