On 05-01-2021 13:57, Bharath Rupireddy wrote:
On Tue, Jan 5, 2021 at 1:00 PM Luc Vlaming <l...@swarm64.com> wrote:
Reviewing further v20-0001:
I would still opt for moving the code for the parallel worker into a
separate function, and then setting rStartup of the dest receiver to
that function in ExecParallelGetInsReceiver, as its completely
independent code. Just a matter of style I guess.
If we were to have a intorel_startup_worker and assign it to
self->pub.rStartup, 1) we can do it in the CreateIntoRelDestReceiver,
we have to pass a parameter to CreateIntoRelDestReceiver as an
indication of parallel worker, which requires code changes in places
wherever CreateIntoRelDestReceiver is used. 2) we can also assign
intorel_startup_worker after CreateIntoRelDestReceiver in
ExecParallelGetInsReceiver, but that doesn't look good to me. 3) we
can duplicate CreateIntoRelDestReceiver and have a
CreateIntoRelParallelDestReceiver with the only change being that
self->pub.rStartup = intorel_startup_worker;
IMHO, the way it is currently, looks good. Anyways, I'm open to
changing that if we agree on any of the above 3 ways.
The current way is good enough, it was a suggestion as personally I find
it hard to read to have two completely separate code paths in the same
function. If any I would opt for something like 3) where there's a
CreateIntoRelParallelDestReceiver which calls CreateIntoRelDestReceiver
and then overrides rStartup to intorel_startup_worker. Then no callsites
have to change except the ones that are for parallel workers.
If we were to do any of the above, then we might have to do the same
thing for other commands Refresh Materialized View or Copy To where we
can parallelize.
Thoughts?
Maybe I'm not completely following why but afaics we want parallel
inserts in various scenarios, not just CTAS? I'm asking because code like
+ if (fpes->ins_cmd_type == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ pg_atomic_add_fetch_u64(&fpes->processed,
queryDesc->estate->es_processed);
seems very specific to CTAS. For now that seems fine but I suppose that
would be generalized soon after? Basically I would have expected the if
to compare against PARALLEL_INSERT_CMD_UNDEF.
After this patch is reviewed and goes for commit, then the next thing
I plan to do is to allow parallel inserts in Refresh Materialized View
and it can be used for that. I think the processed variable can also
be used for parallel inserts in INSERT INTO SELECT [1] as well.
Currently, I'm keeping it for CTAS, maybe later (after this is
committed) it can be generalized.
Thoughts?
Sounds good
[1] -
https://www.postgresql.org/message-id/CAA4eK1LMmz58ej5BgVLJ8VsUGd%3D%2BKcaA8X%3DkStORhxpfpODOxg%40mail.gmail.com
Apart from these small things v20-0001 looks (very) good to me.
v20-0003 and v20-0004:
looks good to me.
Thanks.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Kind regards,
Luc