Re: [HACKERS] Parallel Seq Scan

José Luis Tallón Fri, 05 Dec 2014 07:09:41 -0800

On 12/04/2014 07:35 AM, Amit Kapila wrote:

[snip]


The number of worker backends that can be used for
parallel seq scan can be configured by using a new GUC
parallel_seqscan_degree, the default value of which is zero
and it means parallel seq scan will not be considered unless
user configures this value.

The number of parallel workers should be capped (of course!) at themaximum amount of "processors" (cores/vCores, threads/hyperthreads)available.

More over, when load goes up, the relative cost of parallel workingshould go up as well.

Something like:
    p = number of cores
    l = 1min-load

    additional_cost = tuple estimate * cpu_tuple_cost * (l+1)/(c-1)

(for c>1, of course)

In ExecutorStart phase, initiate the required number of workers
as per parallel seq scan plan and setup dynamic shared memory and
share the information required for worker to execute the scan.
Currently I have just shared the relId, targetlist and number
of blocks to be scanned by worker, however I think we might want
to generate a plan for each of the workers in master backend and
then share the same to individual worker.

[snip]

Attached patch is just to facilitate the discussion about the
parallel seq scan and may be some other dependent tasks like
sharing of various states like combocid, snapshot with parallel
workers.  It is by no means ready to do any complex test, ofcourse
I will work towards making it more robust both in terms of adding
more stuff and doing performance optimizations.

Thoughts/Suggestions?

Not directly (I haven't had the time to read the code yet), but I'mthinking about the ability to simply *replace* executor methods from anextension.This could be an alternative to providing additional nodes that theplanner can include in the final plan tree, ready to be executed.

The parallel seq scan nodes are definitively the best approach for"parallel query", since the planner can optimize them based on cost.I'm wondering about the ability to modify the implementation of somemethods themselves once at execution time: given a previously plannedquery, chances are that, at execution time (I'm specifically thinkingabout prepared statements here), a different implementation of the same"node" might be more suitable and could be used instead while thecondition holds.

If this latter line of thinking is too off-topic within this thread andthere is any interest, we can move the comments to another thread andI'd begin work on a PoC patch. It might as well make sense to implementthe executor overloading mechanism alongide the custom plan API, though.

Any comments appreciated.


Thank you for your work, Amit


Regards,

    / J.L.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

Reply via email to