Re: Merging statistics from children instead of re-sampling everything

Andrey Lepikhov Wed, 30 Jun 2021 05:55:56 -0700

Sorry, I forgot to send CC into pgsql-hackers.
On 29/6/21 13:23, Tomas Vondra wrote:

Because sampling is fairly expensive, especially if you have to do itfor large number of child relations. And you'd have to do that everytime *any* child triggers autovacuum, pretty much. Merging the stats isway cheaper.
See the other thread linked from the first message.

Maybe i couldn't describe my idea clearly.
The most commonly partitioning is used for large tables.

I suppose to store a sampling reservoir for each partition, replace onupdate of statistics and merge to build statistics for parent table.

It can be spilled into tuplestore on a disk, or stored in a parent table.

In the case of complex inheritance we can store sampling reservoirs onlyfor leafs.You can consider this idea as an imagination, but the merging statisticsapproach has an extensibility problem on another types of statistics.

On 6/29/21 9:01 AM, Andrey Lepikhov wrote:
On 30/3/21 03:51, Tomas Vondra wrote:
Of course, that assumes the merge is cheaper than processing the list of
statistics, but I find that plausible, especially the list needs to be
processed multiple (e.g. when considering different join orders, filters
and so on).
I think your approach have a chance. But I didn't understand: why doyou merge statistics? I think we could merge only samples of eachchildren and build statistics as usual.
Error of a sample merging procedure would be quite limited.



--
regards,
Andrey Lepikhov
Postgres Professional

Re: Merging statistics from children instead of re-sampling everything

Reply via email to