2011/8/16 Anssi Kääriäinen <anssi.kaariai...@thl.fi>: > On 08/14/2011 12:31 AM, Heikki Linnakangas wrote: >>> >>> The same idea could of course be used to calculate the effective cache >>> hit ratio for each table. Cache hit ratio would have the problem of feedback >>> loops, though. >> >> Yeah, I'm not excited about making the planner and statistics more >> dynamic. Feedback loops and plan instability are not fun. > > I might be a little out of my league here... But I was thinking about the > cache hit ratio and feedback loops. I understand automatic tuning would be > hard. But making automatic tuning easier (by using pg_tune for example) > would be a big plus for most use cases. > > To make it easier to tune the page read costs automatically, it would be > nice if there would be four variables instead of the current two: > - random_page_cost is the cost of reading a random page from storage. > Currently it is not, it is the cost of accessing a random page, taking in > account it might be in memory. > - seq_page_cost is the cost of reading pages sequentially from storage > - memory_page_cost is the cost of reading a page in memory > - cache_hit_ratio is the expected cache hit ratio > > memory_page_cost would be server global, random and seq page costs > tablespace specific, and cache_hit_ratio relation specific. You would get > the current behavior by tuning *_page_costs realistically, and setting > cache_hit_ratio globally so that the expected random_page_cost / > seq_page_cost stays the same as now. > > The biggest advantage of this would be that the correct values are much > easier to detect automatically compared to current situation. This can be > done using pg_statio_* views and IO speed testing. They should not be tuned > automatically by PostgreSQL, at least not the cache_hit_ratio, as that leads > to the possibility of feedback loops and plan instability. The variables > would also be much easier to understand. > > There is the question if one should be allowed to tune the *_page_costs at > all. If I am not missing something, it is possible to detect the correct > values programmatically and they do not change if you do not change the > hardware. Cache hit ratio is the real reason why they are currently so > important for tuning. > > An example why the current random_page_cost and seq_page_cost tuning is not > adequate is that you can only set random_page_cost per tablespace. That > makes perfect sense if random_page_cost would be the cost of accessing a > page in storage. But it is not, it is a combination of that and caching > effects, so that it actually varies per relation (and over time). How do you > set it correctly for a query where one relation is fully cached and another > one not? > > Another problem is that if you use random_page_cost == seq_page_cost, you > are effectively saying that everything is in cache. But if everything is in > cache, the cost of page access relative to cpu_*_costs is way off. The more > random_page_cost and seq_page_cost are different, the more they mean the > storage access costs. When they are the same, they mean the memory page > cost. There can be an order of magnitude in difference of a storage page > cost and a memory page cost. So it is hard to tune the cpu_*_costs > realistically for cases where sometimes data is in cache and sometimes not. > > Ok, enough hand waving for one post :) Sorry if this all is obvious / > discussed before. My googling didn't turn out anything directly related, > although these have some similarity: > - Per-table random_page_cost for tables that we know are always cached > [http://archives.postgresql.org/pgsql-hackers/2008-04/msg01503.php] > - Script to compute random page cost > [http://archives.postgresql.org/pgsql-hackers/2002-09/msg00503.php] > - The science of optimization in practical terms? > [http://archives.postgresql.org/pgsql-hackers/2009-02/msg00718.php], getting > really interesting starting from here: > [http://archives.postgresql.org/pgsql-hackers/2009-02/msg00787.php]
late reply. You can add this link to your list: http://archives.postgresql.org/pgsql-hackers/2011-06/msg01140.php -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers