Re: [HACKERS] parallelize queries containing initplans

Robert Haas Thu, 16 Nov 2017 09:15:41 -0800

On Thu, Nov 16, 2017 at 5:23 AM, Kuntal Ghosh
<[email protected]> wrote:
> I've tested the above-mentioned scenario with this patch and it is
> working fine. Also, I've created a text column named 'vartext',
> inserted some random length texts(max length 100) and tweaked the
> above query as follows:
> select ten,count(*) from tenk1 a where a.ten in (select
> b.ten from tenk1 b where (select a.vartext from tenk1 c where c.ten =
> a.ten limit 1) = b.vartext limit 1) group by a.ten;
> This query is equivalent to select ten,count(*) from tenk1 group by
> a.ten. It also produced the expected result without throwing any
> error.


Great!  I have committed the patch; thanks for testing.

As I said in the commit message, there's a lot more work that could be
done here.  I think we should consider trying to revise this whole
system so that instead of serializing the values and passing them to
the workers, we allocate an array of slots where each slot has a Datum
flag, an isnull flag, and a dsa_pointer (which maybe could be union'd
to the Datum?).  If we're passing a parameter by value, we could just
store it in the Datum field; if it's null, we just set isnull.  If
it's being passed by reference, we dsa_allocate() space for it, copy
it into that space, and then store the dsa_pointer.

The advantage of this is that it would be possible to imagine the
contents of a slot changing while parallelism is running, which
doesn't really work with the current serialized-blob representation.
That would in turn allow us to imagine letting parallel-safe InitPlans
being evaluated by the first participant that needs the value rather
than before launching workers, which would be good, not only because
of the possibility of deferring work for InitPlans attached at or
above the Gather but also because it could be used for InitPlans below
the Gather (as long as they don't depend on any parameters computed
below the Gather).  I thought about trying to refactor this patch to
use a slot concept like what I just mentioned even before committing
it, but after poking at it a bit I decided that was going to require
too much new development to justify cramming it into the same patch.
Still, it seems like a good idea to do it at some point (when we have
lots of free time?).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] parallelize queries containing initplans

Reply via email to