On Fri, May 1, 2020 at 2:30 AM Melanie Plageman <melanieplage...@gmail.com> wrote: > I made a few edits to the message and threw it into a draft patch (on > top of master, of course). I didn't want to junk up peoples' inboxes, so > I didn't start a separate thread, but, it will be pretty hard to > collaboratively edit the comment/ever register it for a commitfest if it > is wedged into this thread. What do you think?
+1, this is a good description and I'm sure you're right about the name of the algorithm. It's a "hybrid" between a simple no partition hash join, and partitioning like the Grace machine, since batch 0 is processed directly without touching the disk. You mention that PHJ finalises the number of batches during build phase while SHJ can extend it later. There's also a difference in the probe phase: although inner batch 0 is loaded into the hash table directly and not written to disk during the build phase (= classic hybrid, just like the serial algorithm), outer batch 0 *is* written out to disk at the start of the probe phase (unlike classic hybrid at least as we have it for serial hash join). That's because I couldn't figure out how to begin emitting tuples before partitioning was finished, without breaking the deadlock-avoidance programming rule that you can't let the program counter escape from the node when someone might wait for you. So maybe it's erm, a hybrid between hybrid and Grace...