Re: Merging statistics from children instead of re-sampling everything

Andrey Lepikhov Thu, 10 Feb 2022 03:50:48 -0800

On 21/1/2022 01:25, Tomas Vondra wrote:

But I don't have a very good idea what to do about statistics that we
can't really merge. For some types of statistics it's rather tricky to
reasonably merge the results - ndistinct is a simple example, although
we could work around that by building and merging hyperloglog counters.

I think, as a first step on this way we can reduce a number of pulledtuples. We don't really needed to pull all tuples from a remote server.To construct a reservoir, we can pull only a tuple sample. Reservoirmethod needs only a few arguments to return a sample like you readtuples locally. Also, to get such parts of samples asynchronously, wecan get size of each partition on a preliminary step of analysis.In my opinion, even this solution can reduce heaviness of a problemdrastically.


--
regards,
Andrey Lepikhov
Postgres Professional

Re: Merging statistics from children instead of re-sampling everything

Reply via email to