On Wed, 31 Mar 2021 at 05:34, Zhihong Yu <z...@yugabyte.com> wrote: > > Hi, > In paraminfo_get_equal_hashops(), > > + /* Reject if there are any volatile functions */ > + if (contain_volatile_functions(expr)) > + { > > You can move the above code to just ahead of: > > + if (IsA(expr, Var)) > + var_relids = bms_make_singleton(((Var *) expr)->varno); > > This way, when we return early, var_relids doesn't need to be populated.
Thanks for having another look. I did a bit more work in that area and removed that code. I dug a little deeper and I can't see any way that a lateral_var on a rel can refer to anything inside the rel. It looks like that code was just a bit over paranoid about that. I also added some additional caching in RestrictInfo to cache the hash equality operator to use for the result cache. This saves checking this each time we consider a join during the join search. In many cases we would have used the value cached in RestrictInfo.hashjoinoperator, however, for non-equaliy joins, that would have be set to InvalidOid. We can still use Result Cache for non-equality joins. I've now pushed the main patch. There's a couple of things I'm not perfectly happy with: 1. The name. There's a discussion on [1] if anyone wants to talk about that. 2. For lateral joins, there's no place to cache the hash equality operator. Maybe there's some rework to do to add the ability to check things for those like we use RestrictInfo for regular joins. 3. No ability to cache n_distinct estimates. This must be repeated each time we consider a join. RestrictInfo allows caching for this to speed up clauselist_selectivity() for other join types. There was no consensus reached on the name of the node. "Tuple Cache" seems like the favourite so far, but there's not been a great deal of input. At least not enough that I was motivated to rename everything. People will perhaps have more time to consider names during beta. Thank you to everyone who gave input and reviewed this patch. It would be great to get feedback on the performance with real workloads. As mentioned in the commit message, there is a danger that it causes performance regressions when n_distinct estimates are significantly underestimated. I'm off to look at the buildfarm now. David [1] https://www.postgresql.org/message-id/CAApHDvq=yqxr5kqhrvit2rhnkwtoawr9jan5t+5_pzhurj3...@mail.gmail.com