On Tue, 10 Nov 2020 at 15:38, Andy Fan <zhihui.fan1...@gmail.com> wrote: > While I have interest about what caused the tiny difference, I admit that > what direction > this patch should go is more important. Not sure if anyone is convinced that > v8 and v9 have a similar performance. The current data show it is similar. I > want to > profile/read code more, but I don't know what part I should pay attention to. > So I think > any hints on why v9 should be better at a noticeable level in theory should > be very > helpful. After that, I'd like to read the code or profile more carefully.
It was thought by putting the cache code directly inside nodeNestloop.c that the overhead of fetching a tuple from a subnode could be eliminated when we get a cache hit. A cache hit on v8 looks like: Nest loop -> Fetch new outer row Nest loop -> Fetch inner row Result Cache -> cache hit return first cached tuple Nest loop -> eval qual and return tuple if matches With v9 it's more like: Nest Loop -> Fetch new outer row Nest loop -> cache hit return first cached tuple Nest loop -> eval qual and return tuple if matches So 1 less hop between nodes. In reality, the hop is not that expensive, so might not be a big enough factor to slow the execution down. There's some extra complexity in v9 around the slot type of the inner tuple. A cache hit means the slot type is Minimal. But a miss means the slot type is whatever type the inner node's slot is. So some code exists to switch the qual and projection info around depending on if we get a cache hit or a miss. I did some calculations on how costly pulling a tuple through a node in [1]. David [1] https://www.postgresql.org/message-id/CAKJS1f9UXdk6ZYyqbJnjFO9a9hyHKGW7B%3DZRh-rxy9qxfPA5Gw%40mail.gmail.com