Re: [HACKERS] Optimizer questions

Tom Lane Thu, 10 Mar 2016 17:46:37 -0800

I wrote:
> As far as that goes, it seems to me after thinking about it that
> non-sort-column tlist items containing SRFs should always be postponed,
> too.  Performing a SRF before sorting bloats the sort data vertically,
> rather than horizontally, but it's still bloat.  (Although as against
> that, when you have ORDER BY + LIMIT, postponing SRFs loses the ability
> to use a bounded sort.)  The killer point though is that unless the sort
> is stable, it might cause surprising changes in the order of the SRF
> output values.  Our sorts aren't stable; here's an example in HEAD:


> # select q1, generate_series(1,9) from int8_tbl order by q1 limit 7;
>  q1  | generate_series 
> -----+-----------------
>  123 |               2
>  123 |               3
>  123 |               4
>  123 |               5
>  123 |               6
>  123 |               7
>  123 |               1
> (7 rows)

> I think that's pretty surprising, and if we have an opportunity to
> provide more intuitive semantics here, we should do it.

Here's an updated patch (requires current HEAD) that takes care of
window functions correctly and also does something reasonable with
set-returning functions:

# explain verbose select q1, generate_series(1,9) from int8_tbl order by q1 
limit 7;
                                   QUERY PLAN                                   
 
---------------------------------------------------------------------------------
 Limit  (cost=1.11..1.14 rows=7 width=12)
   Output: q1, (generate_series(1, 9))
   ->  Result  (cost=1.11..26.16 rows=5000 width=12)
         Output: q1, generate_series(1, 9)
         ->  Sort  (cost=1.11..1.12 rows=5 width=8)
               Output: q1
               Sort Key: int8_tbl.q1
               ->  Seq Scan on public.int8_tbl  (cost=0.00..1.05 rows=5 width=8)
                     Output: q1
(9 rows)

# select q1, generate_series(1,9) from int8_tbl order by q1 limit 7;
 q1  | generate_series 
-----+-----------------
 123 |               1
 123 |               2
 123 |               3
 123 |               4
 123 |               5
 123 |               6
 123 |               7
(7 rows)

I added some user-facing documentation too.  I think this is committable,
though maybe we should add a regression test case or two.

                        regards, tom lane

diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 44810f4..0520f2c 100644
*** a/doc/src/sgml/ref/select.sgml
--- b/doc/src/sgml/ref/select.sgml
*************** UNBOUNDED FOLLOWING
*** 993,998 ****
--- 993,1028 ----
      cases it is not possible to specify new names with <literal>AS</>;
      the output column names will be the same as the table columns' names.
     </para>
+ 
+    <para>
+     According to the SQL standard, the expressions in the output list should
+     be computed before applying <literal>DISTINCT</literal>, <literal>ORDER
+     BY</literal>, or <literal>LIMIT</literal>.  This is obviously necessary
+     when using <literal>DISTINCT</literal>, since otherwise it's not clear
+     what values are being made distinct.  However, in many cases it is
+     convenient if output expressions are computed after <literal>ORDER
+     BY</literal> and <literal>LIMIT</literal>; particularly if the output list
+     contains any volatile or expensive functions.  With that behavior, the
+     order of function evaluations is more intuitive and there will not be
+     evaluations corresponding to rows that never appear in the output.
+     <productname>PostgreSQL</> will effectively evaluate output expressions
+     after sorting and limiting, so long as those expressions are not
+     referenced in <literal>DISTINCT</literal>, <literal>ORDER BY</literal>
+     or <literal>GROUP BY</literal>.  (As a counterexample, <literal>SELECT
+     f(x) FROM tab ORDER BY 1</> clearly must evaluate <function>f(x)</>
+     before sorting.)  Output expressions that contain set-returning functions
+     are effectively evaluated after sorting and before limiting, so
+     that <literal>LIMIT</literal> will act to cut off the output from a
+     set-returning function.
+    </para>
+ 
+    <note>
+     <para>
+      <productname>PostgreSQL</> versions before 9.6 did not provide any
+      guarantees about the timing of evaluation of output expressions versus
+      sorting and limiting; it depended on the form of the chosen query plan.
+     </para>
+    </note>
    </refsect2>
  
    <refsect2 id="sql-distinct">
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index a2cd6de..51f4ee4 100644
*** a/src/backend/optimizer/plan/planner.c
--- b/src/backend/optimizer/plan/planner.c
*************** static RelOptInfo *create_distinct_paths
*** 127,132 ****
--- 127,133 ----
  					  RelOptInfo *input_rel);
  static RelOptInfo *create_ordered_paths(PlannerInfo *root,
  					 RelOptInfo *input_rel,
+ 					 PathTarget *target,
  					 double limit_tuples);
  static PathTarget *make_group_input_target(PlannerInfo *root, List *tlist);
  static List *postprocess_setop_tlist(List *new_tlist, List *orig_tlist);
*************** static PathTarget *make_window_input_tar
*** 135,140 ****
--- 136,144 ----
  						 List *tlist, List *activeWindows);
  static List *make_pathkeys_for_window(PlannerInfo *root, WindowClause *wc,
  						 List *tlist);
+ static PathTarget *make_sort_input_target(PlannerInfo *root,
+ 					   PathTarget *final_target,
+ 					   bool *have_postponed_srfs);
  
  
  /*****************************************************************************
*************** grouping_planner(PlannerInfo *root, bool
*** 1379,1384 ****
--- 1383,1391 ----
  	int64		offset_est = 0;
  	int64		count_est = 0;
  	double		limit_tuples = -1.0;
+ 	bool		have_postponed_srfs = false;
+ 	double		tlist_rows;
+ 	PathTarget *final_target;
  	RelOptInfo *current_rel;
  	RelOptInfo *final_rel;
  	ListCell   *lc;
*************** grouping_planner(PlannerInfo *root, bool
*** 1438,1443 ****
--- 1445,1453 ----
  		/* Save aside the final decorated tlist */
  		root->processed_tlist = tlist;
  
+ 		/* Also extract the PathTarget form of the setop result tlist */
+ 		final_target = current_rel->cheapest_total_path->pathtarget;
+ 
  		/*
  		 * Can't handle FOR [KEY] UPDATE/SHARE here (parser should have
  		 * checked already, but let's make sure).
*************** grouping_planner(PlannerInfo *root, bool
*** 1462,1472 ****
  	else
  	{
  		/* No set operations, do regular planning */
! 		PathTarget *final_target;
  		PathTarget *grouping_target;
  		PathTarget *scanjoin_target;
  		bool		have_grouping;
- 		double		tlist_rows;
  		WindowFuncLists *wflists = NULL;
  		List	   *activeWindows = NIL;
  		List	   *rollup_lists = NIL;
--- 1472,1481 ----
  	else
  	{
  		/* No set operations, do regular planning */
! 		PathTarget *sort_input_target;
  		PathTarget *grouping_target;
  		PathTarget *scanjoin_target;
  		bool		have_grouping;
  		WindowFuncLists *wflists = NULL;
  		List	   *activeWindows = NIL;
  		List	   *rollup_lists = NIL;
*************** grouping_planner(PlannerInfo *root, bool
*** 1620,1626 ****
  		 * Figure out whether there's a hard limit on the number of rows that
  		 * query_planner's result subplan needs to return.  Even if we know a
  		 * hard limit overall, it doesn't apply if the query has any
! 		 * grouping/aggregation operations.
  		 */
  		if (parse->groupClause ||
  			parse->groupingSets ||
--- 1629,1638 ----
  		 * Figure out whether there's a hard limit on the number of rows that
  		 * query_planner's result subplan needs to return.  Even if we know a
  		 * hard limit overall, it doesn't apply if the query has any
! 		 * grouping/aggregation operations.  (XXX it also doesn't apply if the
! 		 * tlist contains any SRFs; but checking for that here seems more
! 		 * costly than it's worth, since root->limit_tuples is only used for
! 		 * cost estimates, and only in a small number of cases.)
  		 */
  		if (parse->groupClause ||
  			parse->groupingSets ||
*************** grouping_planner(PlannerInfo *root, bool
*** 1658,1679 ****
  		final_target = create_pathtarget(root, tlist);
  
  		/*
  		 * If we have window functions to deal with, the output from any
  		 * grouping step needs to be what the window functions want;
! 		 * otherwise, it should just be final_target.
  		 */
  		if (activeWindows)
  			grouping_target = make_window_input_target(root,
  													   tlist,
  													   activeWindows);
  		else
! 			grouping_target = final_target;
  
  		/*
  		 * If we have grouping or aggregation to do, the topmost scan/join
! 		 * plan node must emit what the grouping step wants; otherwise, if
! 		 * there's window functions, it must emit what the window functions
! 		 * want; otherwise, it should emit final_target.
  		 */
  		have_grouping = (parse->groupClause || parse->groupingSets ||
  						 parse->hasAggs || root->hasHavingQual);
--- 1670,1701 ----
  		final_target = create_pathtarget(root, tlist);
  
  		/*
+ 		 * If ORDER BY was given, consider whether we should use a post-sort
+ 		 * projection, and compute the adjusted target for preceding steps if
+ 		 * so.
+ 		 */
+ 		if (parse->sortClause)
+ 			sort_input_target = make_sort_input_target(root, final_target,
+ 													   &have_postponed_srfs);
+ 		else
+ 			sort_input_target = final_target;
+ 
+ 		/*
  		 * If we have window functions to deal with, the output from any
  		 * grouping step needs to be what the window functions want;
! 		 * otherwise, it should be sort_input_target.
  		 */
  		if (activeWindows)
  			grouping_target = make_window_input_target(root,
  													   tlist,
  													   activeWindows);
  		else
! 			grouping_target = sort_input_target;
  
  		/*
  		 * If we have grouping or aggregation to do, the topmost scan/join
! 		 * plan node must emit what the grouping step wants; otherwise, it
! 		 * should emit grouping_target.
  		 */
  		have_grouping = (parse->groupClause || parse->groupingSets ||
  						 parse->hasAggs || root->hasHavingQual);
*************** grouping_planner(PlannerInfo *root, bool
*** 1734,1779 ****
  			current_rel = create_window_paths(root,
  											  current_rel,
  											  grouping_target,
! 											  final_target,
  											  tlist,
  											  wflists,
  											  activeWindows);
  		}
  
  		/*
- 		 * If there are set-returning functions in the tlist, scale up the
- 		 * assumed output rowcounts of all surviving Paths to account for
- 		 * that.  This is a bit of a kluge, but it's not clear how to account
- 		 * for it in a more principled way.  We definitely don't want to apply
- 		 * the multiplier more than once, which would happen if we tried to
- 		 * fold it into PathTarget accounting.  And the expansion does happen
- 		 * before any explicit DISTINCT or ORDER BY processing is done.
- 		 */
- 		tlist_rows = tlist_returns_set_rows(tlist);
- 		if (tlist_rows > 1)
- 		{
- 			foreach(lc, current_rel->pathlist)
- 			{
- 				Path	   *path = (Path *) lfirst(lc);
- 
- 				/*
- 				 * We assume that execution costs of the tlist as such were
- 				 * already accounted for.  However, it still seems appropriate
- 				 * to charge something more for the executor's general costs
- 				 * of processing the added tuples.  The cost is probably less
- 				 * than cpu_tuple_cost, though, so we arbitrarily use half of
- 				 * that.
- 				 */
- 				path->total_cost += path->rows * (tlist_rows - 1) *
- 					cpu_tuple_cost / 2;
- 
- 				path->rows *= tlist_rows;
- 			}
- 
- 			/* There seems no need for a fresh set_cheapest comparison. */
- 		}
- 
- 		/*
  		 * If there is a DISTINCT clause, consider ways to implement that. We
  		 * build a new upperrel representing the output of this phase.
  		 */
--- 1756,1768 ----
  			current_rel = create_window_paths(root,
  											  current_rel,
  											  grouping_target,
! 											  sort_input_target,
  											  tlist,
  											  wflists,
  											  activeWindows);
  		}
  
  		/*
  		 * If there is a DISTINCT clause, consider ways to implement that. We
  		 * build a new upperrel representing the output of this phase.
  		 */
*************** grouping_planner(PlannerInfo *root, bool
*** 1787,1803 ****
  
  	/*
  	 * If ORDER BY was given, consider ways to implement that, and generate a
! 	 * new upperrel containing only paths that emit the correct ordering.  We
! 	 * can apply the original limit_tuples limit in sorting now.
  	 */
  	if (parse->sortClause)
  	{
  		current_rel = create_ordered_paths(root,
  										   current_rel,
  										   limit_tuples);
  	}
  
  	/*
  	 * Now we are prepared to build the final-output upperrel.  Insert all
  	 * surviving paths, with LockRows, Limit, and/or ModifyTable steps added
  	 * if needed.
--- 1776,1826 ----
  
  	/*
  	 * If ORDER BY was given, consider ways to implement that, and generate a
! 	 * new upperrel containing only paths that emit the correct ordering and
! 	 * project the correct final_target.  We can apply the original
! 	 * limit_tuples limit in sort costing here, but only if there are no
! 	 * postponed SRFs.
  	 */
  	if (parse->sortClause)
  	{
  		current_rel = create_ordered_paths(root,
  										   current_rel,
+ 										   final_target,
+ 										   have_postponed_srfs ? -1.0 :
  										   limit_tuples);
  	}
  
  	/*
+ 	 * If there are set-returning functions in the tlist, scale up the output
+ 	 * rowcounts of all surviving Paths to account for that.  Note that if any
+ 	 * SRFs appear in sorting or grouping columns, we'll have underestimated
+ 	 * the numbers of rows passing through earlier steps; but that's such a
+ 	 * weird usage that it doesn't seem worth greatly complicating matters to
+ 	 * account for it.
+ 	 */
+ 	tlist_rows = tlist_returns_set_rows(tlist);
+ 	if (tlist_rows > 1)
+ 	{
+ 		foreach(lc, current_rel->pathlist)
+ 		{
+ 			Path	   *path = (Path *) lfirst(lc);
+ 
+ 			/*
+ 			 * We assume that execution costs of the tlist as such were
+ 			 * already accounted for.  However, it still seems appropriate to
+ 			 * charge something more for the executor's general costs of
+ 			 * processing the added tuples.  The cost is probably less than
+ 			 * cpu_tuple_cost, though, so we arbitrarily use half of that.
+ 			 */
+ 			path->total_cost += path->rows * (tlist_rows - 1) *
+ 				cpu_tuple_cost / 2;
+ 
+ 			path->rows *= tlist_rows;
+ 		}
+ 		/* No need to run set_cheapest; we're keeping all paths anyway. */
+ 	}
+ 
+ 	/*
  	 * Now we are prepared to build the final-output upperrel.  Insert all
  	 * surviving paths, with LockRows, Limit, and/or ModifyTable steps added
  	 * if needed.
*************** create_distinct_paths(PlannerInfo *root,
*** 3705,3716 ****
--- 3728,3741 ----
   * cheapest-total existing path.
   *
   * input_rel: contains the source-data Paths
+  * target: the output tlist the result Paths must emit
   * limit_tuples: estimated bound on the number of output tuples,
   *		or -1 if no LIMIT or couldn't estimate
   */
  static RelOptInfo *
  create_ordered_paths(PlannerInfo *root,
  					 RelOptInfo *input_rel,
+ 					 PathTarget *target,
  					 double limit_tuples)
  {
  	Path	   *cheapest_input_path = input_rel->cheapest_total_path;
*************** create_ordered_paths(PlannerInfo *root,
*** 3738,3743 ****
--- 3763,3774 ----
  												 root->sort_pathkeys,
  												 limit_tuples);
  			}
+ 
+ 			/* Add projection step if needed */
+ 			if (path->pathtarget != target)
+ 				path = apply_projection_to_path(root, ordered_rel,
+ 												path, target);
+ 
  			add_path(ordered_rel, path);
  		}
  	}
*************** make_pathkeys_for_window(PlannerInfo *ro
*** 4144,4149 ****
--- 4175,4383 ----
  }
  
  /*
+  * make_sort_input_target
+  *	  Generate appropriate PathTarget for initial input to Sort step.
+  *
+  * If the query has ORDER BY, this function chooses the target to be computed
+  * by the node just below the Sort (and DISTINCT, if any, since Unique can't
+  * project) steps.  This might or might not be identical to the query's final
+  * output target.
+  *
+  * The main argument for keeping the sort-input tlist the same as the final
+  * is that we avoid a separate projection node (which will be needed if
+  * they're different, because Sort can't project).  However, there are also
+  * advantages to postponing tlist evaluation till after the Sort: it ensures
+  * a consistent order of evaluation for any volatile functions in the tlist,
+  * and if there's also a LIMIT, we can stop the query without ever computing
+  * tlist functions for later rows, which is beneficial for both volatile and
+  * expensive functions.
+  *
+  * Our current policy is to postpone volatile expressions till after the sort
+  * unconditionally (assuming that that's possible, ie they are in plain tlist
+  * columns and not ORDER BY/GROUP BY/DISTINCT columns).  We also postpone
+  * set-returning expressions unconditionally (if possible), because running
+  * them beforehand would bloat the sort dataset, and because it might cause
+  * unexpected output order if the sort isn't stable.  Expensive expressions
+  * are postponed if there is a LIMIT, or if root->tuple_fraction shows that
+  * partial evaluation of the query is possible (if neither is true, we expect
+  * to have to evaluate the expressions for every row anyway), or if there are
+  * any volatile or set-returning expressions (since once we've put in a
+  * projection at all, it won't cost any more to postpone more stuff).
+  *
+  * Another issue that could potentially be considered here is that
+  * evaluating tlist expressions could result in data that's either wider
+  * or narrower than the input Vars, thus changing the volume of data that
+  * has to go through the Sort.  However, we usually have only a very bad
+  * idea of the output width of any expression more complex than a Var,
+  * so for now it seems too risky to try to optimize on that basis.
+  *
+  * Note that if we do produce a modified sort-input target, and then the
+  * query ends up not using an explicit Sort, no particular harm is done:
+  * we'll initially use the modified target for the preceding path nodes,
+  * but then change them to the final target with apply_projection_to_path.
+  * Moreover, in such a case the guarantees about evaluation order of
+  * volatile functions still hold, since the rows are sorted already.
+  *
+  * This function has some things in common with make_group_input_target and
+  * make_window_input_target, though the detailed rules for what to do are
+  * different.  We never flatten/postpone any grouping or ordering columns;
+  * those are needed before the sort.  If we do flatten a particular
+  * expression, we leave Aggref and WindowFunc nodes alone, since those were
+  * computed earlier.
+  *
+  * 'final_target' is the query's final target list (in PathTarget form)
+  * 'have_postponed_srfs' is an output argument, see below
+  *
+  * The result is the PathTarget to be computed by the plan node immediately
+  * below the Sort step (and the Distinct step, if any).  This will be
+  * exactly final_target if we decide a projection step wouldn't be helpful.
+  *
+  * In addition, *have_postponed_srfs is set to TRUE if we choose to postpone
+  * any set-returning functions to after the Sort.
+  */
+ static PathTarget *
+ make_sort_input_target(PlannerInfo *root,
+ 					   PathTarget *final_target,
+ 					   bool *have_postponed_srfs)
+ {
+ 	Query	   *parse = root->parse;
+ 	PathTarget *input_target;
+ 	int			ncols;
+ 	bool	   *postpone_col;
+ 	bool		have_srf;
+ 	bool		have_volatile;
+ 	bool		have_expensive;
+ 	List	   *postponable_cols;
+ 	List	   *postponable_vars;
+ 	int			i;
+ 	ListCell   *lc;
+ 
+ 	/* Shouldn't get here unless query has ORDER BY */
+ 	Assert(parse->sortClause);
+ 
+ 	*have_postponed_srfs = false;		/* default result */
+ 
+ 	/* Inspect tlist and collect per-column information */
+ 	ncols = list_length(final_target->exprs);
+ 	postpone_col = (bool *) palloc0(ncols * sizeof(bool));
+ 	have_srf = have_volatile = have_expensive = false;
+ 
+ 	i = 0;
+ 	foreach(lc, final_target->exprs)
+ 	{
+ 		Expr	   *expr = (Expr *) lfirst(lc);
+ 
+ 		/*
+ 		 * If the column has a sortgroupref, assume it has to be evaluated
+ 		 * before sorting.  Generally such columns would be ORDER BY, GROUP
+ 		 * BY, etc targets.  One exception is columns that were removed from
+ 		 * GROUP BY by remove_useless_groupby_columns() ... but those would
+ 		 * only be Vars anyway.  There don't seem to be any cases where it
+ 		 * would be worth the trouble to double-check.
+ 		 */
+ 		if (final_target->sortgrouprefs[i] == 0)
+ 		{
+ 			/*
+ 			 * If it returns a set or is volatile, that's an unconditional
+ 			 * reason to postpone.  Check the SRF case first because we must
+ 			 * know whether we have any postponed SRFs.
+ 			 */
+ 			if (expression_returns_set((Node *) expr))
+ 			{
+ 				postpone_col[i] = true;
+ 				have_srf = true;
+ 			}
+ 			else if (contain_volatile_functions((Node *) expr))
+ 			{
+ 				postpone_col[i] = true;
+ 				have_volatile = true;
+ 			}
+ 			else
+ 			{
+ 				/*
+ 				 * Else check the cost.  XXX it's annoying to have to do this
+ 				 * when set_pathtarget_cost_width() just did it.  Refactor to
+ 				 * allow sharing the work?
+ 				 */
+ 				QualCost	cost;
+ 
+ 				cost_qual_eval_node(&cost, (Node *) expr, root);
+ 
+ 				/*
+ 				 * We arbitrarily define "expensive" as "more than 10X
+ 				 * cpu_operator_cost".  Note this will take in any PL function
+ 				 * with default cost.
+ 				 */
+ 				if (cost.per_tuple > 10 * cpu_operator_cost)
+ 				{
+ 					postpone_col[i] = true;
+ 					have_expensive = true;
+ 				}
+ 			}
+ 		}
+ 		i++;
+ 	}
+ 
+ 	/*
+ 	 * If we don't need a post-sort projection, just return final_target.
+ 	 */
+ 	if (!(have_srf || have_volatile ||
+ 		  (have_expensive &&
+ 		   (parse->limitCount || root->tuple_fraction > 0))))
+ 		return final_target;
+ 
+ 	/*
+ 	 * Report whether the post-sort projection will contain set-returning
+ 	 * functions.  This is important because it affects whether the Sort can
+ 	 * rely on the query's LIMIT (if any) to bound the number of rows it needs
+ 	 * to return.
+ 	 */
+ 	*have_postponed_srfs = have_srf;
+ 
+ 	/*
+ 	 * Construct the sort-input target, taking all non-postponable columns and
+ 	 * then adding Vars, PlaceHolderVars, Aggrefs, and WindowFuncs found in
+ 	 * the postponable ones.
+ 	 */
+ 	input_target = create_empty_pathtarget();
+ 	postponable_cols = NIL;
+ 
+ 	i = 0;
+ 	foreach(lc, final_target->exprs)
+ 	{
+ 		Expr	   *expr = (Expr *) lfirst(lc);
+ 
+ 		if (postpone_col[i])
+ 			postponable_cols = lappend(postponable_cols, expr);
+ 		else
+ 			add_column_to_pathtarget(input_target, expr,
+ 									 final_target->sortgrouprefs[i]);
+ 
+ 		i++;
+ 	}
+ 
+ 	/*
+ 	 * Pull out all the Vars, Aggrefs, and WindowFuncs mentioned in
+ 	 * postponable columns, and add them to the sort-input target if not
+ 	 * already present.  (Some might be there already.)  We mustn't
+ 	 * deconstruct Aggrefs or WindowFuncs here, since the projection node
+ 	 * would be unable to recompute them.
+ 	 */
+ 	postponable_vars = pull_var_clause((Node *) postponable_cols,
+ 									   PVC_INCLUDE_AGGREGATES |
+ 									   PVC_INCLUDE_WINDOWFUNCS |
+ 									   PVC_INCLUDE_PLACEHOLDERS);
+ 	add_new_columns_to_pathtarget(input_target, postponable_vars);
+ 
+ 	/* clean up cruft */
+ 	list_free(postponable_vars);
+ 	list_free(postponable_cols);
+ 
+ 	/* XXX this represents even more redundant cost calculation ... */
+ 	return set_pathtarget_cost_width(root, input_target);
+ }
+ 
+ /*
   * get_cheapest_fractional_path
   *	  Find the cheapest path for retrieving a specified fraction of all
   *	  the tuples expected to be returned by the given relation.
diff --git a/src/backend/optimizer/util/tlist.c b/src/backend/optimizer/util/tlist.c
index cbc8c2b..9f85dee 100644
*** a/src/backend/optimizer/util/tlist.c
--- b/src/backend/optimizer/util/tlist.c
*************** copy_pathtarget(PathTarget *src)
*** 624,629 ****
--- 624,640 ----
  }
  
  /*
+  * create_empty_pathtarget
+  *	  Create an empty (zero columns, zero cost) PathTarget.
+  */
+ PathTarget *
+ create_empty_pathtarget(void)
+ {
+ 	/* This is easy, but we don't want callers to hard-wire this ... */
+ 	return (PathTarget *) palloc0(sizeof(PathTarget));
+ }
+ 
+ /*
   * add_column_to_pathtarget
   *		Append a target column to the PathTarget.
   *
*************** add_column_to_pathtarget(PathTarget *tar
*** 656,661 ****
--- 667,707 ----
  }
  
  /*
+  * add_new_column_to_pathtarget
+  *		Append a target column to the PathTarget, but only if it's not
+  *		equal() to any pre-existing target expression.
+  *
+  * The caller cannot specify a sortgroupref, since it would be unclear how
+  * to merge that with a pre-existing column.
+  *
+  * As with make_pathtarget_from_tlist, we leave it to the caller to update
+  * the cost and width fields.
+  */
+ void
+ add_new_column_to_pathtarget(PathTarget *target, Expr *expr)
+ {
+ 	if (!list_member(target->exprs, expr))
+ 		add_column_to_pathtarget(target, expr, 0);
+ }
+ 
+ /*
+  * add_new_columns_to_pathtarget
+  *		Apply add_new_column_to_pathtarget() for each element of the list.
+  */
+ void
+ add_new_columns_to_pathtarget(PathTarget *target, List *exprs)
+ {
+ 	ListCell   *lc;
+ 
+ 	foreach(lc, exprs)
+ 	{
+ 		Expr	   *expr = (Expr *) lfirst(lc);
+ 
+ 		add_new_column_to_pathtarget(target, expr);
+ 	}
+ }
+ 
+ /*
   * apply_pathtarget_labeling_to_tlist
   *		Apply any sortgrouprefs in the PathTarget to matching tlist entries
   *
diff --git a/src/include/optimizer/tlist.h b/src/include/optimizer/tlist.h
index 2c79a80..0d745a0 100644
*** a/src/include/optimizer/tlist.h
--- b/src/include/optimizer/tlist.h
*************** extern bool grouping_is_hashable(List *g
*** 55,62 ****
--- 55,65 ----
  extern PathTarget *make_pathtarget_from_tlist(List *tlist);
  extern List *make_tlist_from_pathtarget(PathTarget *target);
  extern PathTarget *copy_pathtarget(PathTarget *src);
+ extern PathTarget *create_empty_pathtarget(void);
  extern void add_column_to_pathtarget(PathTarget *target,
  						 Expr *expr, Index sortgroupref);
+ extern void add_new_column_to_pathtarget(PathTarget *target, Expr *expr);
+ extern void add_new_columns_to_pathtarget(PathTarget *target, List *exprs);
  extern void apply_pathtarget_labeling_to_tlist(List *tlist, PathTarget *target);
  
  /* Convenience macro to get a PathTarget with valid cost/width fields */

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optimizer questions

Reply via email to