Re: [HACKERS] estimating # of distinct values

tv Fri, 07 Jan 2011 03:33:15 -0800

> On Thu, 2010-12-30 at 21:02 -0500, Tom Lane wrote:
>> How is an incremental ANALYZE going to work at all?
>
> How about a kind of continuous analyze ?
>
> Instead of analyzing just once and then drop the intermediate results,
> keep them on disk for all tables and then piggyback the background
> writer (or have a dedicated process if that's not algorithmically
> feasible) and before writing out stuff update the statistics based on
> the values found in modified buffers. Probably it could take a random
> sample of buffers to minimize overhead, but if it is done by a
> background thread the overhead could be minimal anyway on multi-core
> systems.


Hi,

the problem is you will eventually need to drop the results and rebuild
it, as the algorithms do not handle deletes (ok, Florian mentioned an
algorithm L_0 described in one of the papers, but I'm not sure we can use
it).

I'm not sure a constantly running background process is a good idea. I'd
prefer storing an info about the modified tuples somewhere, and starting
analyze only when a given threshold is reached. I'm not sure how to do
that, though.

Another thing I'm not sure about is where to store those intermediate
stats (used to get the current estimate, updated incrementally). I was
thinking about pg_stats but I'm not sure it's the right place - depending
on the algorithm, this may be a fet kilobytes up to several megabytes. And
it's not needed except when updating it. Any ideas?

regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] estimating # of distinct values

Reply via email to