Re: using memoize in in paralel query decreases performance

Pavel Stehule Tue, 07 Mar 2023 01:51:16 -0800

út 7. 3. 2023 v 10:46 odesílatel David Rowley <[email protected]> napsal:


> On Tue, 7 Mar 2023 at 22:09, Pavel Stehule <[email protected]>
> wrote:
> > I can live with it. This is an analytical query and the performance is
> not too important for us. I was surprised that the performance was about
> 25% worse, and so the hit ratio was almost zero. I am thinking, but I am
> not sure if the estimation of the effectiveness of memoization can depend
> (or should depend) on the number of workers? In this case the number of
> workers is high.
>
> The costing for Memoize takes the number of workers into account by
> way of the change in expected input rows.  The number of estimated
> input rows is effectively just divided by the number of parallel
> workers, so if we expect 1 million rows from the outer side of the
> join and 4 workers, then we'll assume the memorize will deal with
> 250,000 rows per worker.  If the n_distinct estimate for the cache key
> is 500,000, then it's not going to look very attractive to Memoize
> that.  In reality, estimate_num_groups() won't say the number of
> groups is higher than the input rows, but Memoize, with all the other
> overheads factored into the costs, it would never look favourable if
> the planner thought there was never going to be any repeated values.
> The expected cache hit ratio there would be zero.
>

Thanks for the explanation.

Pavel


> David
>

Re: using memoize in in paralel query decreases performance

Reply via email to