On 14 January 2013 07:35, Stefan Andreatta wrote:
> The source of these troubles is the sampling method employed in
> src/backend/commands/analyze.c. Judging from Tom Lane's comment for the
> original implementation in 2004 this has never been thought to be perfect.
> Does anybody see a chance to
A status update on this problem:
1.) Workarounds (setting n_distinct manually) are tested and - as far as
workarounds go - OK.
2.) Source of the problem and possible solution:
The source of these troubles is the sampling method employed in
src/backend/commands/analyze.c. Judging from Tom Lan
On 12/29/2012 10:57 PM, Peter Geoghegan wrote:
On 29 December 2012 20:57, Stefan Andreatta wrote:
...
The general advice here is:
1) Increase default_statistics_target for the column.
I tried that, but to get good estimates under these circumstances, I
need to set the statistics_target so
On Sat, Dec 29, 2012 at 5:57 PM, Stefan Andreatta
wrote:
> n*d / (n - f1 + f1*n/N)
>
> where f1 is the number of values that occurred only once in the sample. n is
> the number of rows sampled, d the number of distincts found and N the total
> number of rows in the table.
>
...
>
> When the numbe
On 12/29/2012 10:57 PM, Peter Geoghegan wrote:
On 29 December 2012 20:57, Stefan Andreatta wrote:
Now, the 2005 discussion goes into great detail on the advantages and
disadvantages of this algorithm, particularly when using small sample sizes,
and several alternatives are discussed. I do not k
On 29 December 2012 20:57, Stefan Andreatta wrote:
> Now, the 2005 discussion goes into great detail on the advantages and
> disadvantages of this algorithm, particularly when using small sample sizes,
> and several alternatives are discussed. I do not know whether anything has
> been changed afte
I have encountered serious under-estimations of distinct values when
values are not evenly distributed but clustered within a column. I think
this problem might be relevant to many real-world use cases and I wonder
if there is a good workaround or possibly a programmatic solution that
could be