Sorry for the naive question, but: is there a problem with analyze doing full table scans? Analyze will not lock anything, will it?

Peter

Greg Stark wrote:
Tom Lane <[EMAIL PROTECTED]> writes:

"Ed L." <[EMAIL PROTECTED]> writes:
So, does this sound like we just happened to get repeatedly horribly unrepresentative random samples with stats target at 10? Are we at the mercy of randomness here? Or is there a better preventive procedure we can follow to systematically identify this kind of situation?
I think the real issue is that stats target 10 is too small for large
tables: the samples are just not large enough to support a decent
numdistinct estimate, which is the critical stat for cases such as this
(ie, estimating the number of hits on a value that's not in the
most-common-values list).

There's been some discussion on -hackers about this area. Sadly the idea of
using samples to calculate numdistinct estimates is fundamentally on pretty
shaky ground.

Whereas a fixed sample size works fine for calculating distribution of values,
in order to generate consistent precision for numdistinct estimates the
samples will have to be a constant fraction of the table -- and unfortunately
a pretty large fraction at that.

So sadly I think "at the mercy of randomness" is pretty accurate. You'll have
to raise the statistics target as the table grows and I expect you'll
eventually run into some downsides of large stats targets.

Some better algorithms were posted, but they would require full table scans
during analyze, not just samples.


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Reply via email to