On Mon, Jan 28, 2008 at 11:14:05PM +0000, Christopher Browne wrote: > On Dec 6, 2007 6:28 PM, Decibel! <[EMAIL PROTECTED]> wrote: > > FWIW, I've never seen anything but a performance increase or no change > > when going from 10 to 100. In most cases there's a noticeable > > improvement since it's common to have over 100k rows in a table, and > > there's just no way to capture any kind of a real picture of that with > > only 10 buckets. > > I'd be more inclined to try to do something that was at least somewhat > data aware. > > The "interesting theory" that I'd like to verify if I had a chance > would be to run through a by-column tuning using a set of heuristics. > My "first order approximation" would be: > > - If a column defines a unique key, then we know there will be no > clustering of values, so no need to increase the count... > > - If a column contains a datestamp, then the distribution of values is > likely to be temporal, so no need to increase the count... > > - If a column has a highly constricted set of values (e.g. - boolean), > then we might *decrease* the count. > > - We might run a query that runs across the table, looking at > frequencies of values, and if it finds a lot of repeated values, we'd > increase the count. > > That's a bit "hand-wavy," but that could lead to both increases and > decreases in the histogram sizes. Given that, we can expect the > overall stat sizes to not forcibly need to grow *enormously*, because > we can hope for there to be cases of shrinkage.
I think that before doing any of that you'd be much better off investigating how much performance penalty there is for maxing out default_statistict_target. If, as I suspect, it's essentially 0 on modern hardware, then I don't think it's worth any more effort. BTW, that investigation wouldn't just be academic either; if we could convince ourselves that there normally wasn't any cost associated with a high default_statistics_target, we could increase the default, which would reduce the amount of traffic we'd see on -performance about bad query plans. -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828
pgpJmmXmUl3KN.pgp
Description: PGP signature