Re: Does TupleQueueReaderNext() really need to copy its result?

Soumyadeep Chakraborty Sat, 11 Jul 2020 12:26:07 -0700

Hey Thomas,

On Fri, Jul 10, 2020 at 7:30 PM Thomas Munro <thomas.mu...@gmail.com> wrote:
>
> > I could reproduce the speed gain that you saw for a plan with a simple
> > parallel sequential scan. However, I got no gain at all for a parallel
> > hash join and parallel agg query.
>
> Right, it's not going to make a difference when you only send one
> tuple through the queue, like COUNT(*) does.


How silly of me! I should have paid more attention to the rows output
from each worker and that there was a select count(*) on the join query.
Anyway, these are a new set of results:

-----------------------------------------------------------------------

select pg_prewarm('lineitem');
select pg_prewarm('orders');
-- lineitem is 17G, orders is 4G. (TPCH scale = 20). shared_buffers = 30G

explain analyze select *
  from lineitem
  join orders on l_orderkey = o_orderkey
 where o_totalprice > 5.00;

[w/o any patch]               637s
[w/ first patch]              635s
[w/ last minimal tuple patch] 568s

-----------------------------------------------------------------------

We do indeed get the speedup.

> > As for gather merge, is it possible to have a situation where the slot
> > input to tqueueReceiveSlot() is a heap slot (as would be the case for a
> > simple select *)? If yes, in those scenarios, we would be incurring an
> > extra call to minimal_tuple_from_heap_tuple() because of the extra call
> > to ExecFetchSlotMinimalTuple() inside tqueueReceiveSlot() in your patch.
> > And since, in a gather merge, we can't avoid the copy on the leader side
> > (heap_copy_minimal_tuple() inside gm_readnext_tuple()), we would be
> > doing extra work in that scenario. I couldn't come up with a plan that
> > creates a scenario like this however.
>
> Hmm.  I wish we had a way to do an "in-place" copy-to-minimal-tuple
> where the caller supplies the memory, with some fast protocol to get
> the size right.  We could use that for copying tuples into shm queues,
> hash join tables etc without an extra palloc()/pfree() and double
> copy.

Do you mean that we should have an implementation for
get_minimal_tuple() for the heap AM and have it return a pointer to the
minimal tuple from the MINIMAL_TUPLE_OFFSET? And then a caller such as
tqueueReceiveSlot() will ensure that the heap tuple from which it wants
to extract the minimal tuple was allocated in the tuple queue in the
first place? If we consider that the node directly below a gather is a
SeqScan, then we could possibly, in ExecInitSeqScan() set-up the
ss_ScanTupleSlot to point to memory in the shared tuple queue?
Similarly, other ExecInit*() methods can do the same for other executor
nodes that involve parallelism? Of course, things would be slightly
different for
the other use cases you mentioned (such as hash table population)

All things considered, I think the patch in its current form should go
in. Having the in-place copy, could be done as a separate patch? Do you
agree?

Regards,
Soumyadeep (VMware)

Re: Does TupleQueueReaderNext() really need to copy its result?

Reply via email to