All, One of our clients is having query plan issues with a table with a continuously advancing timestamp column (i.e. one with default now()). The newest rows, which are the most in demand, are always estimated to be fewer than they are or even non-existant. As a result, the user has to analyze the table every hour ... and it's a very large table.
I've seen this in a lot of other databases, both with timestamp columns and with SERIALs -- both of which are very common table structures. >From my reading of the planner code, the problem seems to be the histgram bounds ... if a requested value is above the high bound, it's assumed to be extremely uncommon or not exist. This leads to bad plans if analyze hasn't been run very recently. My thoughts on dealing with this intelligently without a major change to statstics gathering went along these lines: 1. add columns to pg_statistic to hold estimates of upper and lower bounds growth between analyzes. 2. every time analyze is run, populate these columns with 1/2 of the proprotion of values above or below the previously stored bounds, averaged with the existing value for the new columns. 3. use this factor instead of the existing algorithm to calculate the row estimate for out-of-bounds values. This is obviously a very rough idea, but I wanted to get feedback on the general problem and my approach before going further with it. Thanks! --Josh Berkus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers