[EMAIL PROTECTED] wrote:

In this case, the behavior observed could be changed by altering the sample size for a table. I submit that an arbitrary fixed sample size is not a good base for the analyzer, but that the sample size should be based on the size of the table or some calculation of its deviation.

I can see your point, however I wonder if the issue is that the default
stats settings of '10' (3000 rows, 10 histogram buckets) is too low, and
maybe we should consider making a higher value (say '100') the default.

There is no reason why old stats can't be used to create more accurate
stats. Using succesive analyze operations, we could create better
statistics for the planner. We can increase the sample size based on the
table size. We could, I suppose, also calculate some sort of deviation
statistic so that "n_distinct" can be calculated better with a smaller
sample set.

The idea of either automatically increasing sample size for large tables, or doing a few more samplings with different sizes and examining the stability of the estimates is rather nice, provided we can keep the runtime for ANALYZE to reasonable limits, I guess :-)

The basic problem, though, is that PostgreSQL performed incorrectly on a simple query after indexes were created and analyze performed. Yes, it can be corrected, that's what led me to my conclusions, but shouldn't we try to devise a better system in the future to improve PostgreSQL so it does not need this sort of tuning?

Thanks for clarifying.

bets wishes

Mark



---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Reply via email to