Re: [GENERAL] index vs. seq scan choice?

John D. Burger Fri, 25 May 2007 05:55:45 -0700

Steve Atkins wrote:

Would it be possible to look at a much larger number of samplesduring analyze,
then look at the variation in those to generate a reasonable number of
pg_statistic "samples" to represent our estimate of the actualdistribution?More datapoints for tables where the planner might benefit from it,fewer
where it wouldn't.

You could definitely try to measure the variance of the statistics(using, say, bootstrap resampling), and change the target 'til yougot a "good" tradeoff between small sample size and adequaterepresentation of the distribution. Unfortunately, I think thedefinition of "good" depends strongly on the kinds of queries thatget run. Basically, you want the statistics target to be just bigenough that more stats wouldn't change the plans for common queries.Remember, too, that this is not just one number, it'd be differentfor each column (perhaps zero for most).

I could imagine hillclimbing the stats targets by storing commonqueries and then replaying them, while varying the sample size.There was a discussion last year related to all of this, see:


  http://archives.postgresql.org/pgsql-general/2006-10/msg00526.php

- John D. Burger
  MITRE




---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Re: [GENERAL] index vs. seq scan choice?

Reply via email to