On Tue, 7 Mar 2023 at 22:09, Pavel Stehule <pavel.steh...@gmail.com> wrote: > I can live with it. This is an analytical query and the performance is not > too important for us. I was surprised that the performance was about 25% > worse, and so the hit ratio was almost zero. I am thinking, but I am not sure > if the estimation of the effectiveness of memoization can depend (or should > depend) on the number of workers? In this case the number of workers is high.
The costing for Memoize takes the number of workers into account by way of the change in expected input rows. The number of estimated input rows is effectively just divided by the number of parallel workers, so if we expect 1 million rows from the outer side of the join and 4 workers, then we'll assume the memorize will deal with 250,000 rows per worker. If the n_distinct estimate for the cache key is 500,000, then it's not going to look very attractive to Memoize that. In reality, estimate_num_groups() won't say the number of groups is higher than the input rows, but Memoize, with all the other overheads factored into the costs, it would never look favourable if the planner thought there was never going to be any repeated values. The expected cache hit ratio there would be zero. David