Ășt 7. 3. 2023 v 10:46 odesĂlatel David Rowley <dgrowle...@gmail.com> napsal:
> On Tue, 7 Mar 2023 at 22:09, Pavel Stehule <pavel.steh...@gmail.com> > wrote: > > I can live with it. This is an analytical query and the performance is > not too important for us. I was surprised that the performance was about > 25% worse, and so the hit ratio was almost zero. I am thinking, but I am > not sure if the estimation of the effectiveness of memoization can depend > (or should depend) on the number of workers? In this case the number of > workers is high. > > The costing for Memoize takes the number of workers into account by > way of the change in expected input rows. The number of estimated > input rows is effectively just divided by the number of parallel > workers, so if we expect 1 million rows from the outer side of the > join and 4 workers, then we'll assume the memorize will deal with > 250,000 rows per worker. If the n_distinct estimate for the cache key > is 500,000, then it's not going to look very attractive to Memoize > that. In reality, estimate_num_groups() won't say the number of > groups is higher than the input rows, but Memoize, with all the other > overheads factored into the costs, it would never look favourable if > the planner thought there was never going to be any repeated values. > The expected cache hit ratio there would be zero. > Thanks for the explanation. Pavel > David >