On Dec 6, 2007 6:28 PM, Decibel! <[EMAIL PROTECTED]> wrote: > FWIW, I've never seen anything but a performance increase or no change > when going from 10 to 100. In most cases there's a noticeable > improvement since it's common to have over 100k rows in a table, and > there's just no way to capture any kind of a real picture of that with > only 10 buckets.
I'd be more inclined to try to do something that was at least somewhat data aware. The "interesting theory" that I'd like to verify if I had a chance would be to run through a by-column tuning using a set of heuristics. My "first order approximation" would be: - If a column defines a unique key, then we know there will be no clustering of values, so no need to increase the count... - If a column contains a datestamp, then the distribution of values is likely to be temporal, so no need to increase the count... - If a column has a highly constricted set of values (e.g. - boolean), then we might *decrease* the count. - We might run a query that runs across the table, looking at frequencies of values, and if it finds a lot of repeated values, we'd increase the count. That's a bit "hand-wavy," but that could lead to both increases and decreases in the histogram sizes. Given that, we can expect the overall stat sizes to not forcibly need to grow *enormously*, because we can hope for there to be cases of shrinkage. -- http://linuxfinances.info/info/linuxdistributions.html "The definition of insanity is doing the same thing over and over and expecting different results." -- assortedly attributed to Albert Einstein, Benjamin Franklin, Rita Mae Brown, and Rudyard Kipling ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match