On Fri, Dec 18, 2015 at 3:54 AM, Dilip Kumar <dilipbal...@gmail.com> wrote: > On Fri, Dec 18, 2015 at 7.59 AM Robert Haas <robertmh...@gmail.com> Wrote, >> Uh oh. That's not supposed to happen. A GatherPath is supposed to >> have parallel_safe = false, which should prevent the planner from >> using it to form new partial paths. Is this with the latest version >> of the patch? The plan output suggests that we're somehow reaching >> try_partial_hashjoin_path() with inner_path being a GatherPath, but I >> don't immediately see how that's possible, because >> create_gather_path() sets parallel_safe to false unconditionally, and >> hash_inner_and_outer() never sets cheapest_safe_inner to a path unless >> that path is parallel_safe. > > Yes, you are right, that create_gather_path() sets parallel_safe to false > unconditionally but whenever we are building a non partial path, that time > we should carry forward the parallel_safe state to its parent, and it seems > like that part is missing here..
Ah, right. Woops. I can't exactly replicate your results, but I've attempted to fix this in a systematic way in the new version attached here (parallel-join-v3.patch). >> Do you have a self-contained test case that reproduces this, or any >> insight as to how it's happening here? > > This is TPC-H benchmark case: > we can setup like this.. > 1. git clone https://tkej...@bitbucket.org/tkejser/tpch-dbgen.git > 2. complie using make > 3. ./dbgen –v –s 5 > 4. ./qgen Thanks. After a bit of fiddling I was able to get this to work. I'm attaching two other patches that seem to help this case quite considerably. The first (parallel-reader-order-v1) cause Gather to read from the same worker repeatedly until it can't get another tuple from that worker without blocking, and only then move on to the next worker. With 4 workers, this seems to be drastically more efficient than what's currently in master - I saw the time for Q5 drop from over 17 seconds to about 6 (this was an assert-enabled build running with EXPLAIN ANALYZE, though, so take those numbers with a grain of salt). The second (gather-disuse-physical-tlist.patch) causes Gather to force underlying scan nodes to project, which is a good idea here for reasons very similar to why it's a good idea for the existing node types that use disuse_physical_tlist: forcing extra data through the Gather node is bad. That shaved another half second off this query. The exact query I was using for testing was: explain (analyze, verbose) select n_name, sum(l_extendedprice * (1 - l_discount)) as revenue from customer, orders, lineitem, supplier, nation, region where c_custkey = o_custkey and l_orderkey = o_orderkey and l_suppkey = s_suppkey and c_nationkey = s_nationkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'EUROPE' and o_orderdate >= date '1995-01-01' and o_orderdate < date '1995-01-01' + interval '1' year group by n_name order by revenue desc; -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
From 088a96363231b441fe6aab744b04a522d01fbc17 Mon Sep 17 00:00:00 2001 From: Robert Haas <rh...@postgresql.org> Date: Thu, 19 Nov 2015 20:28:34 -0500 Subject: [PATCH 1/3] Gather pushdown for child tables, hash joins, nested loops. Cost model fix for parallel seq scan. Fixed this so it propagates the parallel_safe flag up the plan tree. --- src/backend/executor/execParallel.c | 66 +++--- src/backend/nodes/outfuncs.c | 4 +- src/backend/optimizer/README | 55 ++++- src/backend/optimizer/path/allpaths.c | 164 +++++++++++---- src/backend/optimizer/path/costsize.c | 32 +-- src/backend/optimizer/path/joinpath.c | 253 +++++++++++++++++++++- src/backend/optimizer/path/joinrels.c | 3 +- src/backend/optimizer/plan/createplan.c | 2 +- src/backend/optimizer/plan/planmain.c | 3 +- src/backend/optimizer/util/pathnode.c | 361 +++++++++++++++++++++++++++++--- src/backend/optimizer/util/relnode.c | 2 + src/include/nodes/relation.h | 4 +- src/include/optimizer/cost.h | 2 +- src/include/optimizer/pathnode.h | 12 +- src/include/optimizer/paths.h | 2 + 15 files changed, 845 insertions(+), 120 deletions(-) diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c index 30e6b3d..5bc8eef 100644 --- a/src/backend/executor/execParallel.c +++ b/src/backend/executor/execParallel.c @@ -167,25 +167,25 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e) e->nnodes++; /* Call estimators for parallel-aware nodes. */ - switch (nodeTag(planstate)) + if (planstate->plan->parallel_aware) { - case T_SeqScanState: - ExecSeqScanEstimate((SeqScanState *) planstate, - e->pcxt); - break; - default: - break; + switch (nodeTag(planstate)) + { + case T_SeqScanState: + ExecSeqScanEstimate((SeqScanState *) planstate, + e->pcxt); + break; + default: + break; + } } return planstate_tree_walker(planstate, ExecParallelEstimate, e); } /* - * Ordinary plan nodes won't do anything here, but parallel-aware plan nodes - * may need to initialize shared state in the DSM before parallel workers - * are available. They can allocate the space they previous estimated using - * shm_toc_allocate, and add the keys they previously estimated using - * shm_toc_insert, in each case targeting pcxt->toc. + * Initialize the dynamic shared memory segment that will be used to control + * parallel execution. */ static bool ExecParallelInitializeDSM(PlanState *planstate, @@ -202,15 +202,26 @@ ExecParallelInitializeDSM(PlanState *planstate, /* Count this node. */ d->nnodes++; - /* Call initializers for parallel-aware plan nodes. */ - switch (nodeTag(planstate)) + /* + * Call initializers for parallel-aware plan nodes. + * + * Ordinary plan nodes won't do anything here, but parallel-aware plan + * nodes may need to initialize shared state in the DSM before parallel + * workers are available. They can allocate the space they previously + * estimated using shm_toc_allocate, and add the keys they previously + * estimated using shm_toc_insert, in each case targeting pcxt->toc. + */ + if (planstate->plan->parallel_aware) { - case T_SeqScanState: - ExecSeqScanInitializeDSM((SeqScanState *) planstate, - d->pcxt); - break; - default: - break; + switch (nodeTag(planstate)) + { + case T_SeqScanState: + ExecSeqScanInitializeDSM((SeqScanState *) planstate, + d->pcxt); + break; + default: + break; + } } return planstate_tree_walker(planstate, ExecParallelInitializeDSM, d); @@ -623,13 +634,16 @@ ExecParallelInitializeWorker(PlanState *planstate, shm_toc *toc) return false; /* Call initializers for parallel-aware plan nodes. */ - switch (nodeTag(planstate)) + if (planstate->plan->parallel_aware) { - case T_SeqScanState: - ExecSeqScanInitializeWorker((SeqScanState *) planstate, toc); - break; - default: - break; + switch (nodeTag(planstate)) + { + case T_SeqScanState: + ExecSeqScanInitializeWorker((SeqScanState *) planstate, toc); + break; + default: + break; + } } return planstate_tree_walker(planstate, ExecParallelInitializeWorker, toc); diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index 63fae82..6658a76 100644 --- a/src/backend/nodes/outfuncs.c +++ b/src/backend/nodes/outfuncs.c @@ -1588,6 +1588,8 @@ _outPathInfo(StringInfo str, const Path *node) else _outBitmapset(str, NULL); WRITE_BOOL_FIELD(parallel_aware); + WRITE_BOOL_FIELD(parallel_safe); + WRITE_INT_FIELD(parallel_degree); WRITE_FLOAT_FIELD(rows, "%.0f"); WRITE_FLOAT_FIELD(startup_cost, "%.2f"); WRITE_FLOAT_FIELD(total_cost, "%.2f"); @@ -1765,7 +1767,6 @@ _outGatherPath(StringInfo str, const GatherPath *node) _outPathInfo(str, (const Path *) node); WRITE_NODE_FIELD(subpath); - WRITE_INT_FIELD(num_workers); WRITE_BOOL_FIELD(single_copy); } @@ -1887,6 +1888,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node) WRITE_NODE_FIELD(reltargetlist); WRITE_NODE_FIELD(pathlist); WRITE_NODE_FIELD(ppilist); + WRITE_NODE_FIELD(partial_pathlist); WRITE_NODE_FIELD(cheapest_startup_path); WRITE_NODE_FIELD(cheapest_total_path); WRITE_NODE_FIELD(cheapest_unique_path); diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README index 916a518..5019804 100644 --- a/src/backend/optimizer/README +++ b/src/backend/optimizer/README @@ -851,4 +851,57 @@ lateral reference. (Perhaps now that that stuff works, we could relax the pullup restriction?) --- bjm & tgl +Parallel Query and Partial Paths +-------------------------------- + +Parallel query involves dividing up the work that needs to be performed +either by an entire query or some portion of the query in such a way that +some of that work can be done by one or more worker processes, which are +called parallel workers. Parallel workers are a subtype of dynamic +background workers; see src/backend/access/transam/README.parallel for a +fuller description. Academic literature on parallel query suggests that +that parallel execution strategies can be divided into essentially two +categories: pipelined parallelism, where the execution of the query is +divided into multiple stages and each stage is handled by a separate +process; and partitioning parallelism, where the data is split between +multiple processes and each process handles a subset of it. The +literature, however, suggests that gains from pipeline parallelism are +often very limited due to the difficulty of avoiding pipeline stalls. +Consequently, we do not currently attempt to generate query plans that +use this technique. + +Instead, we focus on partitioning paralellism, which does not require +that the underlying table be partitioned. It only requires that (1) +there is some method of dividing the data from at least one of the base +tables involved in the relation across multiple processes, (2) allowing +each process to handle its own portion of the data, and then (3) +collecting the results. Requirements (2) and (3) is satisfied by the +executor node Gather, which launches any number of worker processes and +executes its single child plan in all of them (and perhaps in the leader +also, if the children aren't generating enough data to keep the leader +busy). Requirement (1) is handled by the SeqScan node: when invoked +with parallel_aware = true, this node will, in effect, partition the +table on a block by block basis, returning a subset of the tuples from +the relation in each worker where that SeqScan is executed. A similar +scheme could be (and probably should be) implemented for bitmap heap +scans. + +Just as we do for non-parallel access methods, we build Paths to +represent access strategies that can be used in a parallel plan. These +are, in essence, the same strategies that are available in the +non-parallel plan, but there is an important difference: a path that +will run beneath a Gather node returns only a subset of the query +results in each worker, not all of them. To form a path that can +actually be executed, the (rather large) cost of the Gather node must be +accounted for. For this reason among others, paths intended to run +beneath a Gather node - which we call "partial" paths since they return +only a subset of the results in each worker - must be kept separate from +ordinary paths (see RelOptInfo's partial_pathlist and the function +add_partial_path). + +One of the keys to making parallel query effective is to run as much of +the query in parallel as possible. Therefore, we expect it to generally +be desirable to postpone the Gather stage until as near to the top of the +plan as possible. Expanding the range of cases in which more work can be +pushed below the Gather (and costly them accurately) is likely to keep us +busy for a long time to come. diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index 4516cd3..326de0c 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -72,6 +72,7 @@ static void set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, Index rti, RangeTblEntry *rte); static void set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte); +static void create_parallel_paths(PlannerInfo *root, RelOptInfo *rel); static void set_rel_consider_parallel(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte); static bool function_rte_parallel_ok(RangeTblEntry *rte); @@ -612,7 +613,6 @@ static void set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte) { Relids required_outer; - int parallel_threshold = 1000; /* * We don't support pushing join clauses into the quals of a seqscan, but @@ -624,39 +624,9 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte) /* Consider sequential scan */ add_path(rel, create_seqscan_path(root, rel, required_outer, 0)); - /* Consider parallel sequential scan */ - if (rel->consider_parallel && rel->pages > parallel_threshold && - required_outer == NULL) - { - Path *path; - int parallel_degree = 1; - - /* - * Limit the degree of parallelism logarithmically based on the size - * of the relation. This probably needs to be a good deal more - * sophisticated, but we need something here for now. - */ - while (rel->pages > parallel_threshold * 3 && - parallel_degree < max_parallel_degree) - { - parallel_degree++; - parallel_threshold *= 3; - if (parallel_threshold >= PG_INT32_MAX / 3) - break; - } - - /* - * Ideally we should consider postponing the gather operation until - * much later, after we've pushed joins and so on atop the parallel - * sequential scan path. But we don't have the infrastructure for - * that yet, so just do this for now. - */ - path = create_seqscan_path(root, rel, required_outer, parallel_degree); - path = (Path *) - create_gather_path(root, rel, path, required_outer, - parallel_degree); - add_path(rel, path); - } + /* If appropriate, consider parallel sequential scan */ + if (rel->consider_parallel && required_outer == NULL) + create_parallel_paths(root, rel); /* Consider index scans */ create_index_paths(root, rel); @@ -666,6 +636,54 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte) } /* + * create_parallel_paths + * Build parallel access paths for a plain relation + */ +static void +create_parallel_paths(PlannerInfo *root, RelOptInfo *rel) +{ + int parallel_threshold = 1000; + int parallel_degree = 1; + + /* + * If this relation is too small to be worth a parallel scan, just return + * without doing anything ... unless it's an inheritance child. In that case, + * we want to generate a parallel path here anyway. It might not be worthwhile + * just for this relation, but when combined with all of its inheritance siblings + * it may well pay off. + */ + if (rel->pages < parallel_threshold && rel->reloptkind == RELOPT_BASEREL) + return; + + /* + * Limit the degree of parallelism logarithmically based on the size of the + * relation. This probably needs to be a good deal more sophisticated, but we + * need something here for now. + */ + while (rel->pages > parallel_threshold * 3 && + parallel_degree < max_parallel_degree) + { + parallel_degree++; + parallel_threshold *= 3; + if (parallel_threshold >= PG_INT32_MAX / 3) + break; + } + + /* Add an unordered partial path based on a parallel sequential scan. */ + add_partial_path(rel, create_seqscan_path(root, rel, NULL, parallel_degree)); + + /* + * If this is a baserel, consider gathering any partial paths we may have + * just created. If we gathered an inheritance child, we could end up + * with a very large number of gather nodes, each trying to grab its own + * pool of workers, so don't do this in that case. Instead, we'll + * consider gathering partial paths for the appendrel. + */ + if (rel->reloptkind == RELOPT_BASEREL) + generate_gather_paths(root, rel); +} + +/* * set_tablesample_rel_size * Set size estimates for a sampled relation */ @@ -1039,6 +1057,8 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, List *live_childrels = NIL; List *subpaths = NIL; bool subpaths_valid = true; + List *partial_subpaths = NIL; + bool partial_subpaths_valid = true; List *all_child_pathkeys = NIL; List *all_child_outers = NIL; ListCell *l; @@ -1093,6 +1113,13 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, else subpaths_valid = false; + /* Same idea, but for a partial plan. */ + if (childrel->partial_pathlist != NIL) + partial_subpaths = accumulate_append_subpath(partial_subpaths, + linitial(childrel->partial_pathlist)); + else + partial_subpaths_valid = false; + /* * Collect lists of all the available path orderings and * parameterizations for all the children. We use these as a @@ -1164,7 +1191,39 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, * if we have zero or one live subpath due to constraint exclusion.) */ if (subpaths_valid) - add_path(rel, (Path *) create_append_path(rel, subpaths, NULL)); + add_path(rel, (Path *) create_append_path(rel, subpaths, NULL, 0)); + + /* + * Consider an append of partial unordered, unparameterized partial paths. + */ + if (partial_subpaths_valid) + { + AppendPath *appendpath; + ListCell *lc; + int parallel_degree = 0; + + /* + * Decide what parallel degree to request for this append path. For + * now, we just use the maximum parallel degree of any member. It + * might be useful to use a higher number if the Append node were + * smart enough to spread out the workers, but it currently isn't. + */ + foreach(lc, partial_subpaths) + { + Path *path = lfirst(lc); + + parallel_degree = Max(parallel_degree, path->parallel_degree); + } + Assert(parallel_degree > 0); + + /* Generate a partial append path. */ + appendpath = create_append_path(rel, partial_subpaths, NULL, + parallel_degree); + add_partial_path(rel, (Path *) appendpath); + + /* Consider gathering it. */ + generate_gather_paths(root, rel); + } /* * Also build unparameterized MergeAppend paths based on the collected @@ -1214,7 +1273,7 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, if (subpaths_valid) add_path(rel, (Path *) - create_append_path(rel, subpaths, required_outer)); + create_append_path(rel, subpaths, required_outer, 0)); } } @@ -1440,8 +1499,9 @@ set_dummy_rel_pathlist(RelOptInfo *rel) /* Discard any pre-existing paths; no further need for them */ rel->pathlist = NIL; + rel->partial_pathlist = NIL; - add_path(rel, (Path *) create_append_path(rel, NIL, NULL)); + add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0)); /* * We set the cheapest path immediately, to ensure that IS_DUMMY_REL() @@ -1844,6 +1904,36 @@ set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte) } /* + * generate_gather_paths + * Generate parallel access paths for a relation by pushing a Gather on + * top of a partial path. + */ +void +generate_gather_paths(PlannerInfo *root, RelOptInfo *rel) +{ + Path *cheapest_partial_path; + Path *simple_gather_path; + + /* If there are no partial paths, there's nothing to do here. */ + if (rel->partial_pathlist == NIL) + return; + + /* + * The output of Gather is currently always unsorted, so there's only one + * partial path of interest: the cheapest one. + * + * Eventually, we should have a Gather Merge operation that can merge + * multiple tuple streams together while preserving their ordering. We + * could usefully generate such a path from each partial path that has + * non-NIL pathkeys. + */ + cheapest_partial_path = linitial(rel->partial_pathlist); + simple_gather_path = (Path *) + create_gather_path(root, rel, cheapest_partial_path, NULL); + add_path(rel, simple_gather_path); +} + +/* * make_rel_from_joinlist * Build access paths using a "joinlist" to guide the join path search. * diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c index 990486c..1e76c03 100644 --- a/src/backend/optimizer/path/costsize.c +++ b/src/backend/optimizer/path/costsize.c @@ -186,24 +186,34 @@ clamp_row_est(double nrows) */ void cost_seqscan(Path *path, PlannerInfo *root, - RelOptInfo *baserel, ParamPathInfo *param_info, - int nworkers) + RelOptInfo *baserel, ParamPathInfo *param_info) { Cost startup_cost = 0; Cost run_cost = 0; double spc_seq_page_cost; QualCost qpqual_cost; Cost cpu_per_tuple; + double parallel_divisor = 1; /* Should only be applied to base relations */ Assert(baserel->relid > 0); Assert(baserel->rtekind == RTE_RELATION); + /* + * Primitive parallel cost model. Assume the leader will do half as much + * work as a regular worker, because it will also need to read the tuples + * returned by the workers when they percolate up to the gather ndoe. + * This is almost certainly not exactly the right way to model this, so + * this will probably need to be changed at some point... + */ + if (path->parallel_degree > 0) + parallel_divisor = path->parallel_degree + 0.5; + /* Mark the path with the correct row estimate */ if (param_info) - path->rows = param_info->ppi_rows; + path->rows = param_info->ppi_rows / parallel_divisor; else - path->rows = baserel->rows; + path->rows = baserel->rows / parallel_divisor; if (!enable_seqscan) startup_cost += disable_cost; @@ -216,24 +226,14 @@ cost_seqscan(Path *path, PlannerInfo *root, /* * disk costs */ - run_cost += spc_seq_page_cost * baserel->pages; + run_cost += spc_seq_page_cost * baserel->pages / parallel_divisor; /* CPU costs */ get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost); startup_cost += qpqual_cost.startup; cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple; - run_cost += cpu_per_tuple * baserel->tuples; - - /* - * Primitive parallel cost model. Assume the leader will do half as much - * work as a regular worker, because it will also need to read the tuples - * returned by the workers when they percolate up to the gather ndoe. - * This is almost certainly not exactly the right way to model this, so - * this will probably need to be changed at some point... - */ - if (nworkers > 0) - run_cost = run_cost / (nworkers + 0.5); + run_cost += cpu_per_tuple * baserel->tuples / parallel_divisor; path->startup_cost = startup_cost; path->total_cost = startup_cost + run_cost; diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c index 53d8fdd..33268d1 100644 --- a/src/backend/optimizer/path/joinpath.c +++ b/src/backend/optimizer/path/joinpath.c @@ -34,6 +34,12 @@ static void sort_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel, static void match_unsorted_outer(PlannerInfo *root, RelOptInfo *joinrel, RelOptInfo *outerrel, RelOptInfo *innerrel, JoinType jointype, JoinPathExtraData *extra); +static void consider_parallel_nestloop(PlannerInfo *root, + RelOptInfo *joinrel, + RelOptInfo *outerrel, + RelOptInfo *innerrel, + JoinType jointype, + JoinPathExtraData *extra); static void hash_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel, RelOptInfo *outerrel, RelOptInfo *innerrel, JoinType jointype, JoinPathExtraData *extra); @@ -216,7 +222,12 @@ add_paths_to_joinrel(PlannerInfo *root, jointype, &extra); /* - * 6. Finally, give extensions a chance to manipulate the path list. + * 6. Consider gathering partial paths. + */ + generate_gather_paths(root, joinrel); + + /* + * 7. Finally, give extensions a chance to manipulate the path list. */ if (set_join_pathlist_hook) set_join_pathlist_hook(root, joinrel, outerrel, innerrel, @@ -330,6 +341,62 @@ try_nestloop_path(PlannerInfo *root, } /* + * try_partial_nestloop_path + * Consider a partial nestloop join path; if it appears useful, push it into + * the joinrel's partial_pathlist via add_partial_path(). + */ +static void +try_partial_nestloop_path(PlannerInfo *root, + RelOptInfo *joinrel, + Path *outer_path, + Path *inner_path, + List *pathkeys, + JoinType jointype, + JoinPathExtraData *extra) +{ + JoinCostWorkspace workspace; + + /* + * If the inner path is parameterized, the parameterization must be fully + * satisfied by the proposed outer path. Parameterized partial paths are + * not supported. The caller should already have verified that no + * extra_lateral_rels are required here. + */ + Assert(bms_is_empty(joinrel->lateral_relids)); + if (inner_path->param_info != NULL) + { + Relids inner_paramrels = inner_path->param_info->ppi_req_outer; + + if (!bms_is_subset(inner_paramrels, outer_path->parent->relids)) + return; + } + + /* + * Before creating a path, get a quick lower bound on what it is likely + * to cost. Bail out right away if it looks terrible. + */ + initial_cost_nestloop(root, &workspace, jointype, + outer_path, inner_path, + extra->sjinfo, &extra->semifactors); + if (!add_partial_path_precheck(joinrel, workspace.total_cost, pathkeys)) + return; + + /* Might be good enough to be worth trying, so let's try it. */ + add_partial_path(joinrel, (Path *) + create_nestloop_path(root, + joinrel, + jointype, + &workspace, + extra->sjinfo, + &extra->semifactors, + outer_path, + inner_path, + extra->restrictlist, + pathkeys, + NULL)); +} + +/* * try_mergejoin_path * Consider a merge join path; if it appears useful, push it into * the joinrel's pathlist via add_path(). @@ -472,6 +539,62 @@ try_hashjoin_path(PlannerInfo *root, } /* + * try_partial_hashjoin_path + * Consider a partial hashjoin join path; if it appears useful, push it into + * the joinrel's partial_pathlist via add_partial_path(). + */ +static void +try_partial_hashjoin_path(PlannerInfo *root, + RelOptInfo *joinrel, + Path *outer_path, + Path *inner_path, + List *hashclauses, + JoinType jointype, + JoinPathExtraData *extra) +{ + JoinCostWorkspace workspace; + + /* + * If the inner path is parameterized, the parameterization must be fully + * satisfied by the proposed outer path. Parameterized partial paths are + * not supported. The caller should already have verified that no + * extra_lateral_rels are required here. + */ + Assert(bms_is_empty(joinrel->lateral_relids)); + if (inner_path->param_info != NULL) + { + Relids inner_paramrels = inner_path->param_info->ppi_req_outer; + + if (!bms_is_empty(inner_paramrels)) + return; + } + + /* + * Before creating a path, get a quick lower bound on what it is likely + * to cost. Bail out right away if it looks terrible. + */ + initial_cost_hashjoin(root, &workspace, jointype, hashclauses, + outer_path, inner_path, + extra->sjinfo, &extra->semifactors); + if (!add_partial_path_precheck(joinrel, workspace.total_cost, NIL)) + return; + + /* Might be good enough to be worth trying, so let's try it. */ + add_partial_path(joinrel, (Path *) + create_hashjoin_path(root, + joinrel, + jointype, + &workspace, + extra->sjinfo, + &extra->semifactors, + outer_path, + inner_path, + extra->restrictlist, + NULL, + hashclauses)); +} + +/* * clause_sides_match_join * Determine whether a join clause is of the right form to use in this join. * @@ -1063,6 +1186,85 @@ match_unsorted_outer(PlannerInfo *root, break; } } + + /* + * If the joinrel is parallel-safe and the join type supports nested loops, + * we may be able to consider a partial nestloop plan. However, we can't + * handle JOIN_UNIQUE_OUTER, because the outer path will be partial, and + * therefore we won't be able to properly guarantee uniqueness. Nor can + * we handle extra_lateral_rels, since partial paths must not be + * parameterized. + */ + if (joinrel->consider_parallel && nestjoinOK && + save_jointype != JOIN_UNIQUE_OUTER && + bms_is_empty(joinrel->lateral_relids)) + consider_parallel_nestloop(root, joinrel, outerrel, innerrel, + save_jointype, extra); +} + +/* + * consider_parallel_nestloop + * Try to build partial paths for a joinrel by joining a partial path for the + * outer relation to a complete path for the inner relation. + * + * 'joinrel' is the join relation + * 'outerrel' is the outer join relation + * 'innerrel' is the inner join relation + * 'jointype' is the type of join to do + * 'extra' contains additional input values + */ +static void +consider_parallel_nestloop(PlannerInfo *root, + RelOptInfo *joinrel, + RelOptInfo *outerrel, + RelOptInfo *innerrel, + JoinType jointype, + JoinPathExtraData *extra) +{ + ListCell *lc1; + + foreach(lc1, outerrel->partial_pathlist) + { + Path *outerpath = (Path *) lfirst(lc1); + List *pathkeys; + ListCell *lc2; + + /* Figure out what useful ordering any paths we create will have. */ + pathkeys = build_join_pathkeys(root, joinrel, jointype, + outerpath->pathkeys); + + /* + * Try the cheapest parameterized paths; only those which will + * produce an unparameterized path when joined to this outerrel + * will survive try_partial_nestloop_path. The cheapest + * unparameterized path is also in this list. + */ + foreach(lc2, innerrel->cheapest_parameterized_paths) + { + Path *innerpath = (Path *) lfirst(lc2); + + /* Can't join to an inner path that is not parallel-safe */ + if (!innerpath->parallel_safe) + continue; + + /* + * Like match_unsorted_outer, we only consider a single nestloop + * path when the jointype is JOIN_UNIQUE_INNER. But we have to scan + * cheapest_parameterized_paths to find the one we want to consider, + * because cheapest_total_path might not be parallel-safe. + */ + if (jointype == JOIN_UNIQUE_INNER) + { + if (!bms_is_empty(PATH_REQ_OUTER(innerpath))) + continue; + innerpath = (Path *) create_unique_path(root, innerrel, + innerpath, extra->sjinfo); + } + + try_partial_nestloop_path(root, joinrel, outerpath, innerpath, + pathkeys, jointype, extra); + } + } } /* @@ -1240,6 +1442,55 @@ hash_inner_and_outer(PlannerInfo *root, } } } + + /* + * If the joinrel is parallel-safe, we may be able to consider a + * partial hash join. However, we can't handle JOIN_UNIQUE_OUTER, + * because the outer path will be partial, and therefore we won't be + * able to properly guarantee uniqueness. Also, the resulting path + * must not be parameterized. + */ + if (joinrel->consider_parallel && jointype != JOIN_UNIQUE_OUTER && + outerrel->partial_pathlist != NIL && + bms_is_empty(joinrel->lateral_relids)) + { + Path *cheapest_partial_outer; + Path *cheapest_safe_inner = NULL; + + cheapest_partial_outer = + (Path *) linitial(outerrel->partial_pathlist); + + /* + * Normally, given that the joinrel is parallel-safe, the cheapest + * total inner path will also be parallel-safe, but if not, we'll + * have to search cheapest_parameterized_paths for the cheapest + * unparameterized inner path. + */ + if (cheapest_total_inner->parallel_safe) + cheapest_safe_inner = cheapest_total_inner; + else + { + ListCell *lc; + + foreach(lc, innerrel->cheapest_parameterized_paths) + { + Path *innerpath = (Path *) lfirst(lc); + + if (innerpath->parallel_safe && + bms_is_empty(PATH_REQ_OUTER(innerpath))) + { + cheapest_safe_inner = innerpath; + break; + } + } + } + + if (cheapest_safe_inner != NULL) + try_partial_hashjoin_path(root, joinrel, + cheapest_partial_outer, + cheapest_safe_inner, + hashclauses, jointype, extra); + } } } diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c index ad58058..e32f445 100644 --- a/src/backend/optimizer/path/joinrels.c +++ b/src/backend/optimizer/path/joinrels.c @@ -1194,9 +1194,10 @@ mark_dummy_rel(RelOptInfo *rel) /* Evict any previously chosen paths */ rel->pathlist = NIL; + rel->partial_pathlist = NIL; /* Set up the dummy path */ - add_path(rel, (Path *) create_append_path(rel, NIL, NULL)); + add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0)); /* Set or update cheapest_total_path and related fields */ set_cheapest(rel); diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c index 32f903d..94e5f26 100644 --- a/src/backend/optimizer/plan/createplan.c +++ b/src/backend/optimizer/plan/createplan.c @@ -1125,7 +1125,7 @@ create_gather_plan(PlannerInfo *root, GatherPath *best_path) gather_plan = make_gather(subplan->targetlist, NIL, - best_path->num_workers, + best_path->path.parallel_degree, best_path->single_copy, subplan); diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c index 894f968..a8ff2ff 100644 --- a/src/backend/optimizer/plan/planmain.c +++ b/src/backend/optimizer/plan/planmain.c @@ -84,7 +84,8 @@ query_planner(PlannerInfo *root, List *tlist, /* The only path for it is a trivial Result path */ add_path(final_rel, (Path *) - create_result_path((List *) parse->jointree->quals)); + create_result_path(final_rel, + (List *) parse->jointree->quals)); /* Select cheapest path (pretty easy in this case...) */ set_cheapest(final_rel); diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c index ec0910d..d1c8e00 100644 --- a/src/backend/optimizer/util/pathnode.c +++ b/src/backend/optimizer/util/pathnode.c @@ -217,7 +217,12 @@ compare_path_costs_fuzzily(Path *path1, Path *path2, double fuzz_factor) * The cheapest_parameterized_paths list collects all parameterized paths * that have survived the add_path() tournament for this relation. (Since * add_path ignores pathkeys for a parameterized path, these will be paths - * that have best cost or best row count for their parameterization.) + * that have best cost or best row count for their parameterization. We + * may also have both a parallel-safe and a non-parallel-safe path in some + * cases for the same parameterization in some cases, but this should be + * relatively rare since, most typically, all paths for the same relation + * will be parallel-safe or none of them will.) + * * cheapest_parameterized_paths always includes the cheapest-total * unparameterized path, too, if there is one; the users of that list find * it more convenient if that's included. @@ -352,11 +357,12 @@ set_cheapest(RelOptInfo *parent_rel) * A path is worthy if it has a better sort order (better pathkeys) or * cheaper cost (on either dimension), or generates fewer rows, than any * existing path that has the same or superset parameterization rels. + * We also consider parallel-safe paths more worthy than others. * * We also remove from the rel's pathlist any old paths that are dominated * by new_path --- that is, new_path is cheaper, at least as well ordered, - * generates no more rows, and requires no outer rels not required by the - * old path. + * generates no more rows, requires no outer rels not required by the old + * path, and is no less parallel-safe. * * In most cases, a path with a superset parameterization will generate * fewer rows (since it has more join clauses to apply), so that those two @@ -470,14 +476,16 @@ add_path(RelOptInfo *parent_rel, Path *new_path) { if ((outercmp == BMS_EQUAL || outercmp == BMS_SUBSET1) && - new_path->rows <= old_path->rows) + new_path->rows <= old_path->rows && + new_path->parallel_safe >= old_path->parallel_safe) remove_old = true; /* new dominates old */ } else if (keyscmp == PATHKEYS_BETTER2) { if ((outercmp == BMS_EQUAL || outercmp == BMS_SUBSET2) && - new_path->rows >= old_path->rows) + new_path->rows >= old_path->rows && + new_path->parallel_safe <= old_path->parallel_safe) accept_new = false; /* old dominates new */ } else /* keyscmp == PATHKEYS_EQUAL */ @@ -487,19 +495,25 @@ add_path(RelOptInfo *parent_rel, Path *new_path) /* * Same pathkeys and outer rels, and fuzzily * the same cost, so keep just one; to decide - * which, first check rows and then do a fuzzy - * cost comparison with very small fuzz limit. - * (We used to do an exact cost comparison, - * but that results in annoying - * platform-specific plan variations due to - * roundoff in the cost estimates.) If things - * are still tied, arbitrarily keep only the - * old path. Notice that we will keep only - * the old path even if the less-fuzzy - * comparison decides the startup and total - * costs compare differently. + * which, first check parallel-safety, then + * rows, then do a fuzzy cost comparison with + * very small fuzz limit. (We used to do an + * exact cost comparison, but that results in + * annoying platform-specific plan variations + * due to roundoff in the cost estimates.) If + * things are still tied, arbitrarily keep + * only the old path. Notice that we will + * keep only the old path even if the + * less-fuzzy comparison decides the startup + * and total costs compare differently. */ - if (new_path->rows < old_path->rows) + if (new_path->parallel_safe > + old_path->parallel_safe) + remove_old = true; /* new dominates old */ + else if (new_path->parallel_safe < + old_path->parallel_safe) + accept_new = false; /* old dominates new */ + else if (new_path->rows < old_path->rows) remove_old = true; /* new dominates old */ else if (new_path->rows > old_path->rows) accept_new = false; /* old dominates new */ @@ -512,10 +526,12 @@ add_path(RelOptInfo *parent_rel, Path *new_path) * dominates new */ } else if (outercmp == BMS_SUBSET1 && - new_path->rows <= old_path->rows) + new_path->rows <= old_path->rows && + new_path->parallel_safe >= old_path->parallel_safe) remove_old = true; /* new dominates old */ else if (outercmp == BMS_SUBSET2 && - new_path->rows >= old_path->rows) + new_path->rows >= old_path->rows && + new_path->parallel_safe <= old_path->parallel_safe) accept_new = false; /* old dominates new */ /* else different parameterizations, keep both */ } @@ -527,7 +543,8 @@ add_path(RelOptInfo *parent_rel, Path *new_path) PATH_REQ_OUTER(old_path)); if ((outercmp == BMS_EQUAL || outercmp == BMS_SUBSET1) && - new_path->rows <= old_path->rows) + new_path->rows <= old_path->rows && + new_path->parallel_safe >= old_path->parallel_safe) remove_old = true; /* new dominates old */ } break; @@ -538,7 +555,8 @@ add_path(RelOptInfo *parent_rel, Path *new_path) PATH_REQ_OUTER(old_path)); if ((outercmp == BMS_EQUAL || outercmp == BMS_SUBSET2) && - new_path->rows >= old_path->rows) + new_path->rows >= old_path->rows && + new_path->parallel_safe <= old_path->parallel_safe) accept_new = false; /* old dominates new */ } break; @@ -685,6 +703,214 @@ add_path_precheck(RelOptInfo *parent_rel, return true; } +/* + * add_partial_path + * Like add_path, our goal here is to consider whether a path is worthy + * of being kept around, but the considerations here are a bit different. + * A partial path is one which can be executed in any number of workers in + * parallel such that each worker will generate a subset of the path's + * overall result. + * + * We don't generate parameterized partial paths for several reasons. Most + * importantly, they're not safe to execute, because there's nothing to + * make sure that a parallel scan within the parameterized portion of the + * plan is running with the same value in every worker at the same time. + * Fortunately, it seems unlikely to be worthwhile anyway, because having + * each worker scan the entire outer relation and a subset of the inner + * relation will generally be a terrible plan. The inner (parameterized) + * side of the plan will be small anyway. There could be rare cases where + * this wins big - e.g. if join order constraints put a 1-row relation on + * the outer side of the topmost join with a parameterized plan on the inner + * side - but we'll have to be content not to handle such cases until somebody + * builds an executor infrastructure that can cope with them. + * + * Because we don't consider parameterized paths here, we also don't + * need to consider the row counts as a measure of quality: every path will + * produce the same number of rows. Neither do we need to consider startup + * costs: parallelism is only used for plans that will be run to completion. + * Therefore, this routine is much simpler than add_path: it needs to + * consider only pathkeys and total cost. + */ +void +add_partial_path(RelOptInfo *parent_rel, Path *new_path) +{ + bool accept_new = true; /* unless we find a superior old path */ + ListCell *insert_after = NULL; /* where to insert new item */ + ListCell *p1; + ListCell *p1_prev; + ListCell *p1_next; + + /* Check for query cancel. */ + CHECK_FOR_INTERRUPTS(); + + /* + * As in add_path, throw out any paths which are dominated by the new + * path, but throw out the new path if some existing path dominates it. + */ + p1_prev = NULL; + for (p1 = list_head(parent_rel->partial_pathlist); p1 != NULL; + p1 = p1_next) + { + Path *old_path = (Path *) lfirst(p1); + bool remove_old = false; /* unless new proves superior */ + PathKeysComparison keyscmp; + + p1_next = lnext(p1); + + /* Compare pathkeys. */ + keyscmp = compare_pathkeys(new_path->pathkeys, old_path->pathkeys); + + /* Unless pathkeys are incompable, keep just one of the two paths. */ + if (keyscmp != PATHKEYS_DIFFERENT) + { + if (new_path->total_cost > old_path->total_cost * STD_FUZZ_FACTOR) + { + /* New path costs more; keep it only if pathkeys are better. */ + if (keyscmp != PATHKEYS_BETTER1) + accept_new = false; + } + else if (old_path->total_cost > new_path->total_cost + * STD_FUZZ_FACTOR) + { + /* Old path costs more; keep it only if pathkeys are better. */ + if (keyscmp != PATHKEYS_BETTER2) + remove_old = true; + } + else if (keyscmp == PATHKEYS_BETTER1) + { + /* Costs are about the same, new path has better pathkeys. */ + remove_old = true; + } + else if (keyscmp == PATHKEYS_BETTER2) + { + /* Costs are about the same, old path has better pathkeys. */ + accept_new = false; + } + else if (old_path->total_cost > new_path->total_cost * 1.0000000001) + { + /* Pathkeys are the same, and the old path costs more. */ + remove_old = true; + } + else + { + /* + * Pathkeys are the same, and new path isn't materially + * cheaper. + */ + accept_new = false; + } + } + + /* + * Remove current element from partial_pathlist if dominated by new. + */ + if (remove_old) + { + parent_rel->partial_pathlist = + list_delete_cell(parent_rel->partial_pathlist, p1, p1_prev); + /* add_path has a special case for IndexPath; we don't need it */ + Assert(!IsA(old_path, IndexPath)); + pfree(old_path); + /* p1_prev does not advance */ + } + else + { + /* new belongs after this old path if it has cost >= old's */ + if (new_path->total_cost >= old_path->total_cost) + insert_after = p1; + /* p1_prev advances */ + p1_prev = p1; + } + + /* + * If we found an old path that dominates new_path, we can quit + * scanning the partial_pathlist; we will not add new_path, and we + * assume new_path cannot dominate any later path. + */ + if (!accept_new) + break; + } + + if (accept_new) + { + /* Accept the new path: insert it at proper place */ + if (insert_after) + lappend_cell(parent_rel->partial_pathlist, insert_after, new_path); + else + parent_rel->partial_pathlist = + lcons(new_path, parent_rel->partial_pathlist); + } + else + { + /* add_path has a special case for IndexPath; we don't need it */ + Assert(!IsA(new_path, IndexPath)); + /* Reject and recycle the new path */ + pfree(new_path); + } +} + +/* + * add_partial_path_precheck + * Check whether a proposed new partial path could possibly get accepted. + * + * Unlike add_path_precheck, we can ignore startup cost and parameterization, + * since they don't matter for partial paths (see add_partial_path). But + * we do want to make sure we don't add a partial path if there's already + * a complete path that dominates it, since in that case the proposed path + * is surely a loser. + */ +bool +add_partial_path_precheck(RelOptInfo *parent_rel, Cost total_cost, + List *pathkeys) +{ + ListCell *p1; + + /* + * Our goal here is twofold. First, we want to find out whether this path + * is clearly inferior to some existing partial path. If so, we want to + * reject it immediately. Second, we want to find out whether this path + * is clearly superior to some existing partial path -- at least, modulo + * final cost computations. If so, we definitely want to consider it. + * + * Unlike add_path(), we always compare pathkeys here. This is because we + * expect partial_pathlist to be very short, and getting a definitive + * answer at this stage avoids the need to call add_path_precheck. + */ + foreach(p1, parent_rel->partial_pathlist) + { + Path *old_path = (Path *) lfirst(p1); + PathKeysComparison keyscmp; + + keyscmp = compare_pathkeys(pathkeys, old_path->pathkeys); + if (keyscmp != PATHKEYS_DIFFERENT) + { + if (total_cost > old_path->total_cost * STD_FUZZ_FACTOR && + keyscmp != PATHKEYS_BETTER1) + return false; + if (old_path->total_cost > total_cost * STD_FUZZ_FACTOR && + keyscmp != PATHKEYS_BETTER2) + return true; + } + } + + /* + * This path is neither clearly inferior to an existing partial path nor + * clearly good enough that it might replace one. Compare it to + * non-parallel plans. If it loses even before accounting for the cost of + * the Gather node, we should definitely reject it. + * + * Note that we pass the total_cost to add_path_precheck twice. This is + * because it's never advantageous to consider the startup cost of a + * partial path; the resulting plans, if run in parallel, will be run to + * completion. + */ + if (!add_path_precheck(parent_rel, total_cost, total_cost, pathkeys, + NULL)) + return false; + + return true; +} + /***************************************************************************** * PATH NODE CREATION ROUTINES @@ -697,7 +923,7 @@ add_path_precheck(RelOptInfo *parent_rel, */ Path * create_seqscan_path(PlannerInfo *root, RelOptInfo *rel, - Relids required_outer, int nworkers) + Relids required_outer, int parallel_degree) { Path *pathnode = makeNode(Path); @@ -705,10 +931,12 @@ create_seqscan_path(PlannerInfo *root, RelOptInfo *rel, pathnode->parent = rel; pathnode->param_info = get_baserel_parampathinfo(root, rel, required_outer); - pathnode->parallel_aware = nworkers > 0 ? true : false; + pathnode->parallel_aware = parallel_degree > 0 ? true : false; + pathnode->parallel_safe = rel->consider_parallel; + pathnode->parallel_degree = parallel_degree; pathnode->pathkeys = NIL; /* seqscan has unordered result */ - cost_seqscan(pathnode, root, rel, pathnode->param_info, nworkers); + cost_seqscan(pathnode, root, rel, pathnode->param_info); return pathnode; } @@ -727,6 +955,8 @@ create_samplescan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer pathnode->param_info = get_baserel_parampathinfo(root, rel, required_outer); pathnode->parallel_aware = false; + pathnode->parallel_safe = rel->consider_parallel; + pathnode->parallel_degree = 0; pathnode->pathkeys = NIL; /* samplescan has unordered result */ cost_samplescan(pathnode, root, rel, pathnode->param_info); @@ -781,6 +1011,8 @@ create_index_path(PlannerInfo *root, pathnode->path.param_info = get_baserel_parampathinfo(root, rel, required_outer); pathnode->path.parallel_aware = false; + pathnode->path.parallel_safe = rel->consider_parallel; + pathnode->path.parallel_degree = 0; pathnode->path.pathkeys = pathkeys; /* Convert clauses to indexquals the executor can handle */ @@ -827,6 +1059,8 @@ create_bitmap_heap_path(PlannerInfo *root, pathnode->path.param_info = get_baserel_parampathinfo(root, rel, required_outer); pathnode->path.parallel_aware = false; + pathnode->path.parallel_safe = bitmapqual->parallel_safe; + pathnode->path.parallel_degree = 0; pathnode->path.pathkeys = NIL; /* always unordered */ pathnode->bitmapqual = bitmapqual; @@ -852,7 +1086,17 @@ create_bitmap_and_path(PlannerInfo *root, pathnode->path.pathtype = T_BitmapAnd; pathnode->path.parent = rel; pathnode->path.param_info = NULL; /* not used in bitmap trees */ + + /* + * Currently, a BitmapHeapPath, BitmapAndPath, or BitmapOrPath will be + * parallel-safe if and only if rel->consider_parallel is set. So, we can + * set the flag for this path based only on the relation-level flag, + * without actually iterating over the list of children. + */ pathnode->path.parallel_aware = false; + pathnode->path.parallel_safe = rel->consider_parallel; + pathnode->path.parallel_degree = 0; + pathnode->path.pathkeys = NIL; /* always unordered */ pathnode->bitmapquals = bitmapquals; @@ -877,7 +1121,17 @@ create_bitmap_or_path(PlannerInfo *root, pathnode->path.pathtype = T_BitmapOr; pathnode->path.parent = rel; pathnode->path.param_info = NULL; /* not used in bitmap trees */ + + /* + * Currently, a BitmapHeapPath, BitmapAndPath, or BitmapOrPath will be + * parallel-safe if and only if rel->consider_parallel is set. So, we can + * set the flag for this path based only on the relation-level flag, + * without actually iterating over the list of children. + */ pathnode->path.parallel_aware = false; + pathnode->path.parallel_safe = rel->consider_parallel; + pathnode->path.parallel_degree = 0; + pathnode->path.pathkeys = NIL; /* always unordered */ pathnode->bitmapquals = bitmapquals; @@ -903,6 +1157,8 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals, pathnode->path.param_info = get_baserel_parampathinfo(root, rel, required_outer); pathnode->path.parallel_aware = false; + pathnode->path.parallel_safe = rel->consider_parallel; + pathnode->path.parallel_degree = 0; pathnode->path.pathkeys = NIL; /* always unordered */ pathnode->tidquals = tidquals; @@ -921,7 +1177,8 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals, * Note that we must handle subpaths = NIL, representing a dummy access path. */ AppendPath * -create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer) +create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer, + int parallel_degree) { AppendPath *pathnode = makeNode(AppendPath); ListCell *l; @@ -931,6 +1188,8 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer) pathnode->path.param_info = get_appendrel_parampathinfo(rel, required_outer); pathnode->path.parallel_aware = false; + pathnode->path.parallel_safe = rel->consider_parallel; + pathnode->path.parallel_degree = parallel_degree; pathnode->path.pathkeys = NIL; /* result is always considered * unsorted */ pathnode->subpaths = subpaths; @@ -955,6 +1214,8 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer) if (l == list_head(subpaths)) /* first node? */ pathnode->path.startup_cost = subpath->startup_cost; pathnode->path.total_cost += subpath->total_cost; + pathnode->path.parallel_safe = pathnode->path.parallel_safe && + subpath->parallel_safe; /* All child paths must have same parameterization */ Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer)); @@ -985,6 +1246,8 @@ create_merge_append_path(PlannerInfo *root, pathnode->path.param_info = get_appendrel_parampathinfo(rel, required_outer); pathnode->path.parallel_aware = false; + pathnode->path.parallel_safe = rel->consider_parallel; + pathnode->path.parallel_degree = 0; pathnode->path.pathkeys = pathkeys; pathnode->subpaths = subpaths; @@ -1008,6 +1271,8 @@ create_merge_append_path(PlannerInfo *root, Path *subpath = (Path *) lfirst(l); pathnode->path.rows += subpath->rows; + pathnode->path.parallel_safe = pathnode->path.parallel_safe && + subpath->parallel_safe; if (pathkeys_contained_in(pathkeys, subpath->pathkeys)) { @@ -1052,7 +1317,7 @@ create_merge_append_path(PlannerInfo *root, * This is only used for the case of a query with an empty jointree. */ ResultPath * -create_result_path(List *quals) +create_result_path(RelOptInfo *rel, List *quals) { ResultPath *pathnode = makeNode(ResultPath); @@ -1060,6 +1325,8 @@ create_result_path(List *quals) pathnode->path.parent = NULL; pathnode->path.param_info = NULL; /* there are no other rels... */ pathnode->path.parallel_aware = false; + pathnode->path.parallel_safe = rel->consider_parallel; + pathnode->path.parallel_degree = 0; pathnode->path.pathkeys = NIL; pathnode->quals = quals; @@ -1094,6 +1361,8 @@ create_material_path(RelOptInfo *rel, Path *subpath) pathnode->path.parent = rel; pathnode->path.param_info = subpath->param_info; pathnode->path.parallel_aware = false; + pathnode->path.parallel_safe = subpath->parallel_safe; + pathnode->path.parallel_degree = 0; pathnode->path.pathkeys = subpath->pathkeys; pathnode->subpath = subpath; @@ -1155,6 +1424,8 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath, pathnode->path.parent = rel; pathnode->path.param_info = subpath->param_info; pathnode->path.parallel_aware = false; + pathnode->path.parallel_safe = subpath->parallel_safe; + pathnode->path.parallel_degree = 0; /* * Assume the output is unsorted, since we don't necessarily have pathkeys @@ -1328,19 +1599,30 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath, */ GatherPath * create_gather_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath, - Relids required_outer, int nworkers) + Relids required_outer) { GatherPath *pathnode = makeNode(GatherPath); + Assert(subpath->parallel_safe); + pathnode->path.pathtype = T_Gather; pathnode->path.parent = rel; pathnode->path.param_info = get_baserel_parampathinfo(root, rel, required_outer); pathnode->path.parallel_aware = false; + pathnode->path.parallel_safe = false; + pathnode->path.parallel_degree = subpath->parallel_degree; pathnode->path.pathkeys = NIL; /* Gather has unordered result */ pathnode->subpath = subpath; - pathnode->num_workers = nworkers; + pathnode->single_copy = false; + + if (pathnode->path.parallel_degree == 0) + { + pathnode->path.parallel_degree = 1; + pathnode->path.pathkeys = subpath->pathkeys; + pathnode->single_copy = true; + } cost_gather(pathnode, root, rel, pathnode->path.param_info); @@ -1393,6 +1675,8 @@ create_subqueryscan_path(PlannerInfo *root, RelOptInfo *rel, pathnode->param_info = get_baserel_parampathinfo(root, rel, required_outer); pathnode->parallel_aware = false; + pathnode->parallel_safe = rel->consider_parallel; + pathnode->parallel_degree = 0; pathnode->pathkeys = pathkeys; cost_subqueryscan(pathnode, root, rel, pathnode->param_info); @@ -1416,6 +1700,8 @@ create_functionscan_path(PlannerInfo *root, RelOptInfo *rel, pathnode->param_info = get_baserel_parampathinfo(root, rel, required_outer); pathnode->parallel_aware = false; + pathnode->parallel_safe = rel->consider_parallel; + pathnode->parallel_degree = 0; pathnode->pathkeys = pathkeys; cost_functionscan(pathnode, root, rel, pathnode->param_info); @@ -1439,6 +1725,8 @@ create_valuesscan_path(PlannerInfo *root, RelOptInfo *rel, pathnode->param_info = get_baserel_parampathinfo(root, rel, required_outer); pathnode->parallel_aware = false; + pathnode->parallel_safe = rel->consider_parallel; + pathnode->parallel_degree = 0; pathnode->pathkeys = NIL; /* result is always unordered */ cost_valuesscan(pathnode, root, rel, pathnode->param_info); @@ -1461,6 +1749,8 @@ create_ctescan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer) pathnode->param_info = get_baserel_parampathinfo(root, rel, required_outer); pathnode->parallel_aware = false; + pathnode->parallel_safe = rel->consider_parallel; + pathnode->parallel_degree = 0; pathnode->pathkeys = NIL; /* XXX for now, result is always unordered */ cost_ctescan(pathnode, root, rel, pathnode->param_info); @@ -1484,6 +1774,8 @@ create_worktablescan_path(PlannerInfo *root, RelOptInfo *rel, pathnode->param_info = get_baserel_parampathinfo(root, rel, required_outer); pathnode->parallel_aware = false; + pathnode->parallel_safe = rel->consider_parallel; + pathnode->parallel_degree = 0; pathnode->pathkeys = NIL; /* result is always unordered */ /* Cost is the same as for a regular CTE scan */ @@ -1517,6 +1809,8 @@ create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel, pathnode->path.param_info = get_baserel_parampathinfo(root, rel, required_outer); pathnode->path.parallel_aware = false; + pathnode->path.parallel_safe = rel->consider_parallel; + pathnode->path.parallel_degree = 0; pathnode->path.rows = rows; pathnode->path.startup_cost = startup_cost; pathnode->path.total_cost = total_cost; @@ -1653,6 +1947,10 @@ create_nestloop_path(PlannerInfo *root, required_outer, &restrict_clauses); pathnode->path.parallel_aware = false; + pathnode->path.parallel_safe = joinrel->consider_parallel && + outer_path->parallel_safe && inner_path->parallel_safe; + /* This is a foolish way to estimate parallel_degree, but for now... */ + pathnode->path.parallel_degree = outer_path->parallel_degree; pathnode->path.pathkeys = pathkeys; pathnode->jointype = jointype; pathnode->outerjoinpath = outer_path; @@ -1711,6 +2009,9 @@ create_mergejoin_path(PlannerInfo *root, required_outer, &restrict_clauses); pathnode->jpath.path.parallel_aware = false; + pathnode->jpath.path.parallel_safe = joinrel->consider_parallel && + outer_path->parallel_safe && inner_path->parallel_safe; + pathnode->jpath.path.parallel_degree = 0; pathnode->jpath.path.pathkeys = pathkeys; pathnode->jpath.jointype = jointype; pathnode->jpath.outerjoinpath = outer_path; @@ -1768,6 +2069,10 @@ create_hashjoin_path(PlannerInfo *root, required_outer, &restrict_clauses); pathnode->jpath.path.parallel_aware = false; + pathnode->jpath.path.parallel_safe = joinrel->consider_parallel && + outer_path->parallel_safe && inner_path->parallel_safe; + /* This is a foolish way to estimate parallel_degree, but for now... */ + pathnode->jpath.path.parallel_degree = outer_path->parallel_degree; /* * A hashjoin never has pathkeys, since its output ordering is diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c index f2bdfcc..29ce75f 100644 --- a/src/backend/optimizer/util/relnode.c +++ b/src/backend/optimizer/util/relnode.c @@ -107,6 +107,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptKind reloptkind) rel->reltargetlist = NIL; rel->pathlist = NIL; rel->ppilist = NIL; + rel->partial_pathlist = NIL; rel->cheapest_startup_path = NULL; rel->cheapest_total_path = NULL; rel->cheapest_unique_path = NULL; @@ -370,6 +371,7 @@ build_join_rel(PlannerInfo *root, joinrel->reltargetlist = NIL; joinrel->pathlist = NIL; joinrel->ppilist = NIL; + joinrel->partial_pathlist = NIL; joinrel->cheapest_startup_path = NULL; joinrel->cheapest_total_path = NULL; joinrel->cheapest_unique_path = NULL; diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h index 5393005..f9f13b4 100644 --- a/src/include/nodes/relation.h +++ b/src/include/nodes/relation.h @@ -458,6 +458,7 @@ typedef struct RelOptInfo List *reltargetlist; /* Vars to be output by scan of relation */ List *pathlist; /* Path structures */ List *ppilist; /* ParamPathInfos used in pathlist */ + List *partial_pathlist; /* partial Paths */ struct Path *cheapest_startup_path; struct Path *cheapest_total_path; struct Path *cheapest_unique_path; @@ -759,6 +760,8 @@ typedef struct Path RelOptInfo *parent; /* the relation this path can build */ ParamPathInfo *param_info; /* parameterization info, or NULL if none */ bool parallel_aware; /* engage parallel-aware logic? */ + bool parallel_safe; /* OK to use as part of parallel plan? */ + int parallel_degree; /* desired parallel degree; 0 = not parallel */ /* estimated size/costs for path (see costsize.c for more info) */ double rows; /* estimated number of result tuples */ @@ -1062,7 +1065,6 @@ typedef struct GatherPath { Path path; Path *subpath; /* path for each worker */ - int num_workers; /* number of workers sought to help */ bool single_copy; /* path must not be executed >1x */ } GatherPath; diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h index ac21a3a..25a7303 100644 --- a/src/include/optimizer/cost.h +++ b/src/include/optimizer/cost.h @@ -72,7 +72,7 @@ extern double clamp_row_est(double nrows); extern double index_pages_fetched(double tuples_fetched, BlockNumber pages, double index_pages, PlannerInfo *root); extern void cost_seqscan(Path *path, PlannerInfo *root, RelOptInfo *baserel, - ParamPathInfo *param_info, int nworkers); + ParamPathInfo *param_info); extern void cost_samplescan(Path *path, PlannerInfo *root, RelOptInfo *baserel, ParamPathInfo *param_info); extern void cost_index(IndexPath *path, PlannerInfo *root, diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h index 8fb9eda..4b24da0 100644 --- a/src/include/optimizer/pathnode.h +++ b/src/include/optimizer/pathnode.h @@ -29,9 +29,12 @@ extern void add_path(RelOptInfo *parent_rel, Path *new_path); extern bool add_path_precheck(RelOptInfo *parent_rel, Cost startup_cost, Cost total_cost, List *pathkeys, Relids required_outer); +extern void add_partial_path(RelOptInfo *parent_rel, Path *new_path); +extern bool add_partial_path_precheck(RelOptInfo *parent_rel, + Cost total_cost, List *pathkeys); extern Path *create_seqscan_path(PlannerInfo *root, RelOptInfo *rel, - Relids required_outer, int nworkers); + Relids required_outer, int parallel_degree); extern Path *create_samplescan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer); extern IndexPath *create_index_path(PlannerInfo *root, @@ -59,19 +62,18 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root, extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals, Relids required_outer); extern AppendPath *create_append_path(RelOptInfo *rel, List *subpaths, - Relids required_outer); + Relids required_outer, int parallel_degree); extern MergeAppendPath *create_merge_append_path(PlannerInfo *root, RelOptInfo *rel, List *subpaths, List *pathkeys, Relids required_outer); -extern ResultPath *create_result_path(List *quals); +extern ResultPath *create_result_path(RelOptInfo *rel, List *quals); extern MaterialPath *create_material_path(RelOptInfo *rel, Path *subpath); extern UniquePath *create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath, SpecialJoinInfo *sjinfo); extern GatherPath *create_gather_path(PlannerInfo *root, - RelOptInfo *rel, Path *subpath, Relids required_outer, - int nworkers); + RelOptInfo *rel, Path *subpath, Relids required_outer); extern Path *create_subqueryscan_path(PlannerInfo *root, RelOptInfo *rel, List *pathkeys, Relids required_outer); extern Path *create_functionscan_path(PlannerInfo *root, RelOptInfo *rel, diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h index 7757741..4b01850 100644 --- a/src/include/optimizer/paths.h +++ b/src/include/optimizer/paths.h @@ -50,6 +50,8 @@ extern RelOptInfo *make_one_rel(PlannerInfo *root, List *joinlist); extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels); +extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel); + #ifdef OPTIMIZER_DEBUG extern void debug_print_rel(PlannerInfo *root, RelOptInfo *rel); #endif -- 2.5.4 (Apple Git-61)
From 0e006c14df600419b56efc9ce8d2eaffb1891a7a Mon Sep 17 00:00:00 2001 From: Robert Haas <rh...@postgresql.org> Date: Fri, 18 Dec 2015 10:02:35 -0500 Subject: [PATCH 2/3] Don't advance to the next reader until we can't read anything from the current one. --- src/backend/executor/nodeGather.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c index f32da1e..db5883d 100644 --- a/src/backend/executor/nodeGather.c +++ b/src/backend/executor/nodeGather.c @@ -359,14 +359,20 @@ gather_readnext(GatherState *gatherstate) continue; } - /* Advance nextreader pointer in round-robin fashion. */ - gatherstate->nextreader = - (gatherstate->nextreader + 1) % gatherstate->nreaders; - /* If we got a tuple, return it. */ if (tup) return tup; + /* + * Advance nextreader pointer in round-robin fashion. Note that we + * only reach this code if we weren't able to get a tuple from the + * current worker. We used to advance the nextreader pointer after + * every tuple, but it turns out to be much more efficient to keep + * reading from the same queue until that would require blocking. + */ + gatherstate->nextreader = + (gatherstate->nextreader + 1) % gatherstate->nreaders; + /* Have we visited every TupleQueueReader? */ if (gatherstate->nextreader == waitpos) { -- 2.5.4 (Apple Git-61)
From 5deb0e7bc1d289616d6fc9b94b83377a47d221fe Mon Sep 17 00:00:00 2001 From: Robert Haas <rh...@postgresql.org> Date: Fri, 18 Dec 2015 10:06:21 -0500 Subject: [PATCH 3/3] Make Gather disuse_physical_tlist. --- src/backend/optimizer/plan/createplan.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c index 94e5f26..0c82529 100644 --- a/src/backend/optimizer/plan/createplan.c +++ b/src/backend/optimizer/plan/createplan.c @@ -558,7 +558,8 @@ use_physical_tlist(PlannerInfo *root, RelOptInfo *rel) * If the plan node immediately above a scan would prefer to get only * needed Vars and not a physical tlist, it must call this routine to * undo the decision made by use_physical_tlist(). Currently, Hash, Sort, - * and Material nodes want this, so they don't have to store useless columns. + * Material, and Gather nodes want this, so they don't have to store or + * transfer useless columns. */ static void disuse_physical_tlist(PlannerInfo *root, Plan *plan, Path *path) @@ -1123,6 +1124,8 @@ create_gather_plan(PlannerInfo *root, GatherPath *best_path) subplan = create_plan_recurse(root, best_path->subpath); + disuse_physical_tlist(root, subplan, best_path->subpath); + gather_plan = make_gather(subplan->targetlist, NIL, best_path->path.parallel_degree, -- 2.5.4 (Apple Git-61)
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers