On Mon, Dec 8, 2014 at 11:21 PM, Robert Haas <robertmh...@gmail.com> wrote: > > On Sat, Dec 6, 2014 at 1:50 AM, Amit Kapila <amit.kapil...@gmail.com> wrote: > > I think we have access to this information in planner (RelOptInfo -> pages), > > if we want, we can use that to eliminate the small relations from > > parallelism, but question is how big relations do we want to consider > > for parallelism, one way is to check via tests which I am planning to > > follow, do you think we have any heuristic which we can use to decide > > how big relations should be consider for parallelism? > > Surely the Path machinery needs to decide this in particular cases > based on cost. We should assign some cost to starting a parallel > worker via some new GUC, like parallel_startup_cost = 100,000. And > then we should also assign a cost to the act of relaying a tuple from > the parallel worker to the master, maybe cpu_tuple_cost (or some new > GUC). For a small relation, or a query with a LIMIT clause, the > parallel startup cost will make starting a lot of workers look > unattractive, but for bigger relations it will make sense from a cost > perspective, which is exactly what we want. >
Sounds sensible. cpu_tuple_cost is already used for some other purpose so not sure if it is right thing to override that parameter, how about cpu_tuple_communication_cost or cpu_tuple_comm_cost. > There are probably other important considerations based on goals for > overall resource utilization, and also because at a certain point > adding more workers won't help because the disk will be saturated. I > don't know exactly what we should do about those issues yet, but the > steps described in the previous paragraph seem like a good place to > start anyway. > Agreed. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com