Good news.

I looked through the code and after a little debugging found that the STATISTICS * 300 gives you the sample size of rows used to gather statistics.
With the symbol_data table with 20million tuples and on this column with about 8000 unique values, i needed a very large sample size.
Even with a STATISTICS of 500 ( 150,000  random rows) I still got a few symbols with a most_common_freqs of .01 or so.
Bumping the STATISTICS to 1000 put the highest most_common_freqs at 0.00788667, so no seq scans now.  
Not too much of a time difference in the analyze either--at least not an impact.


The only strange thing I see is still the estimated rows returned.  The index is picked, so I don't know that it matters.  Even though this query has 688 tuples, the explain thinks 17k+:

Index Scan using symbol_data_pkey on symbol_data  (cost=0.00..70648.22 rows=17700 width=129)

-Michael
[EMAIL PROTECTED]">
Tom Lane wrote:
[EMAIL PROTECTED]">
"Michael G. Martin" <[EMAIL PROTECTED]> writes:
I just ran a vacuum analyze with the specific column.  Still get the 
same explain plan:

Did the pg_stats data change noticeably?

ANALYZE is a statistical sampling process in 7.2, so I'd expect the
results to move around somewhat each time you repeat it. But if it
changes a lot then we have a problem.

You could also try

ALTER TABLE symbol_data ALTER symbol_name SET STATISTICS n

for larger values of n (10 is the default) and then re-ANALYZE
to see if the stats get any more accurate. The default of 10
was more or less picked out of the air ... perhaps it's too small.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly




Reply via email to