On Sat, Jul 11, 2020 at 1:37 PM Soumyadeep Chakraborty <soumyadeep2...@gmail.com> wrote: > +1 to the idea! I ran some experiments on both of your patches.
Hi Soumyadeep, Thanks for testing! > I could reproduce the speed gain that you saw for a plan with a simple > parallel sequential scan. However, I got no gain at all for a parallel > hash join and parallel agg query. Right, it's not going to make a difference when you only send one tuple through the queue, like COUNT(*) does. > As for gather merge, is it possible to have a situation where the slot > input to tqueueReceiveSlot() is a heap slot (as would be the case for a > simple select *)? If yes, in those scenarios, we would be incurring an > extra call to minimal_tuple_from_heap_tuple() because of the extra call > to ExecFetchSlotMinimalTuple() inside tqueueReceiveSlot() in your patch. > And since, in a gather merge, we can't avoid the copy on the leader side > (heap_copy_minimal_tuple() inside gm_readnext_tuple()), we would be > doing extra work in that scenario. I couldn't come up with a plan that > creates a scenario like this however. Hmm. I wish we had a way to do an "in-place" copy-to-minimal-tuple where the caller supplies the memory, with some fast protocol to get the size right. We could use that for copying tuples into shm queues, hash join tables etc without an extra palloc()/pfree() and double copy.