On Fri, Feb 2, 2018 at 12:15 AM, Robert Haas <robertmh...@gmail.com> wrote: > On Wed, Jan 31, 2018 at 11:48 PM, Amit Kapila <amit.kapil...@gmail.com> wrote: > > The other cases aren't so clear. In the case of the first call within > create_ordered_paths, there's no loss in the !is_sorted case because > apply_projection_to_path will be getting called on a Sort path. But > when is_sorted is true, the current code can push a target list into a > Gather or Gather Merge that was created with some other target list, > and with the patch it can't. I'm not quite sure what sort of query > would trigger that problem, but it seems like there's something to > worry about there. >
I think the query plans which involve Gather Merge -> Parallel Index Scan can be impacted. For ex. cosider below case: With parallel-paths-tlist-cost-rmh-v2.patch postgres=# set cpu_operator_cost=0; SET postgres=# set parallel_setup_cost=0;set parallel_tuple_cost=0;set min_parallel_table_scan_size=0;set max_parallel_workers_per_gather=4; SET SET SET SET postgres=# explain (costs off, verbose) select simple_func(aid) from pgbench_accounts where aid > 1000 order by aid; QUERY PLAN --------------------------------------------------------------------------------------- Gather Merge Output: simple_func(aid), aid Workers Planned: 4 -> Parallel Index Only Scan using pgbench_accounts_pkey on public.pgbench_accounts Output: aid Index Cond: (pgbench_accounts.aid > 1000) (6 rows) HEAD and parallel_paths_include_tlist_cost_v7 postgres=# explain (costs off, verbose) select simple_func(aid) from pgbench_accounts where aid > 1000 order by aid; QUERY PLAN --------------------------------------------------------------------------------------- Gather Merge Output: (simple_func(aid)), aid Workers Planned: 4 -> Parallel Index Only Scan using pgbench_accounts_pkey on public.pgbench_accounts Output: simple_func(aid), aid Index Cond: (pgbench_accounts.aid > 1000) (6 rows) For the above test, I have initialized pgbench with 100 scale factor. It shows that with patch parallel-paths-tlist-cost-rmh-v2.patch, we will lose the capability to push target list in some cases. One lame idea could be that at the call location, we detect if it is a Gather (Merge) and then push the target list, but doing so at all the locations doesn't sound to be a good alternative as compared to another approach where we push target list in apply_projection_to_path. > Similarly I can't see any obvious reason why this > isn't a problem for adjust_path_for_srfs and build_minmax_path as > well, although I haven't tried to construct queries that hit those > cases, either. > Neither do I, but I can give it a try if we expect something different than the results of above example. > Now, we could just give up on this approach and leave that code in > apply_projection_to_path, but what's bothering me is that, presumably, > any place where that code is actually getting used has the same > problem that we're trying to fix in grouping_planner: the tlist evals > costs are not being factored into the decision as to which path we > should choose, which might make a good parallel path lose to an > inferior non-parallel path. > Your concern is valid, but isn't the same problem exists in another approach as well, because in that also we can call adjust_paths_for_srfs after generating gather path which means that we might lose some opportunity to reduce the relative cost of parallel paths due to tlists having SRFs. Also, a similar problem can happen in create_order_paths for the cases as described in the example above. > It would be best to fix that throughout > the code base rather than only fixing the more common paths -- if we > can do so with a reasonable amount of work. > Agreed, I think one way to achieve that is instead of discarding parallel paths based on cost, we retain them till the later phase of planning, something like what we do for ordered paths. In that case, the way changes have been done in the patch parallel_paths_include_tlist_cost_v7 will work. I think it will be helpful for other cases as well if we keep the parallel paths as alternative paths till later stage of planning (we have discussed it during parallelizing subplans as well), however, that is a bigger project on its own. I don't think it will be too bad to go with what we have in parallel_paths_include_tlist_cost_v7 with the intent that in future when we do the other project, the other cases will be automatically dealt. I have updated the patch parallel_paths_include_tlist_cost_v7 by changing some of the comments based on parallel-paths-tlist-cost-rmh-v2.patch. I have one additional comment on parallel-paths-tlist-cost-rmh-v2.patch @@ -374,6 +374,14 @@ cost_gather(GatherPath *path, PlannerInfo *root, startup_cost += parallel_setup_cost; run_cost += parallel_tuple_cost * path->path.rows; + /* add tlist eval costs only if projecting */ + if (path->path.pathtarget != path->subpath->pathtarget) + { + /* tlist eval costs are paid per output row, not per tuple scanned */ + startup_cost += path->path.pathtarget->cost.startup; + run_cost += path->path.pathtarget->cost.per_tuple * path->path.rows; + } I think when the tlist eval costs are added, we should subtract the previous cost added for subpath. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
parallel_paths_include_tlist_cost_v8.patch
Description: Binary data