I have encountered serious under-estimations of distinct values when
values are not evenly distributed but clustered within a column. I think
this problem might be relevant to many real-world use cases and I wonder
if there is a good workaround or possibly a programmatic solution that
could be
On 29 December 2012 20:57, Stefan Andreatta wrote:
> Now, the 2005 discussion goes into great detail on the advantages and
> disadvantages of this algorithm, particularly when using small sample sizes,
> and several alternatives are discussed. I do not know whether anything has
> been changed afte