Greg Stark wrote: > So I never realized the consequences of this little heuristic in > analyze.c in the handling of very low cardinality columns where we > want to just capture the complete list of values in the mcv and throw > away the histogram: > > else if (toowide_cnt == 0 && nmultiple == ndistinct) > { > /* > * Every value in the sample appeared more than once. > Assume the > * column has just these values. > */ > stats->stadistinct = ndistinct; > } > > The problem with this heuristic is that if the table is small enough > you might expect you can set the statistics target high and "sample" > the entire table and get a very accurate mcv covering all the values. > However if any of the values in the table appears only once this > heuristic will defeat you. The following code will then throw out of > the mcv any value which isn't 25% more common than "average". Leaving > you with a histogram for those values which often does very poorly if > the values don't fit any pattern and are just discrete arbitrary > values.
Do you want a C comment to document this problem? -- Bruce Momjian <br...@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers