On 10/4/24 20:34, Robert Haas wrote:
On Mon, Sep 23, 2024 at 7:11 AM Alexander Korotkov <aekorot...@gmail.com> wrote:
Makes sense.  Please, check the attached patch freeing the consts list
while returning NULL from match_orclause_to_indexcol().
More generally, many of the comments in this patch seem to just
explain what the code does, and I'd like to reiterate my usual
complaint: as far as possible, comments should explain WHY the code
does what it does. Certainly, in some cases there's nothing to be said
about that e.g. /* Lookup for operator to fetch necessary information
for the SAOP node */ isn't really saying anything non-obvious but it's
reasonable to have the comment here anyway. However, when there is
something more interesting to be said, then we should do that rather
than just reiterate what the reader who knows C can anyway see. For
instance, the lengthy comment beginning with "Iterate over OR
entries." could either be shorter and recapitulate less of the code
that follows, or it could say something more interesting about why
we're doing it like that.
While I know Alexander is already working on this issue, the variants provided in the attachment could offer some valuable insights (see 0001 and 0002).

+ /* We allow constant to be Const or Param */
+ if (!IsA(constExpr, Const) && !IsA(constExpr, Param))
+ break;

This restriction is a lot tighter than the one mentioned in the header
comment of match_clause_to_indexcol ("Our definition of const is
exceedingly liberal"). If there's a reason for that, the comments
should talk about it. If there isn't, it's better to be consistent.If we know the type of result we don't really need this additional
restriction. The only reason I had here is to avoid some strange and ineffective cases like:

SELECT oid,typname FROM pg_type t1
WHERE typtypmod = ANY (ARRAY [1, 1+(
  SELECT max(typtypmod) FROM pg_type t2
  WHERE t1.typtypmod = t2.typtypmod)]);
                         QUERY PLAN
------------------------------------------------------------
 Seq Scan on pg_type t1
   Filter: (typtypmod = ANY (ARRAY[1, (1 + (SubPlan 2))]))
   SubPlan 2
     ->  Result
           InitPlan 1
             ->  Limit
                   ->  Seq Scan on pg_type t2
                         Filter: (t1.typtypmod = typtypmod)

So, it is mostly about trade-off between benefit expected and planning complexity. See a sketch of comment in 0003.

I'm unclear what the current thinking is about the performance of this
patch, both as to planning and as to execution. Do we believe that
this transformation is a categorical win at execution-time? In theory,
OR format alllows for short-circuit execution, but because of the
Const-or-Param restriction above, I don't think that's mostly a
non-issue. But maybe not completely, because I can see from the
regression test changes that it's possible for us to apply this
transformation when the Param is set by an InitPlan or SubPlan. If we
have something like WHERE tenthous = 1 OR tenthous =
(very_expensive_computation() + 1), maybe the patch could lose,
because we'll have to do the very expensive calculation to evaluate
the SAOP, and the OR could stop as soon as we establish that tenthous
!= 1. If we only did the transformation when the Param is an external
parameter, then we wouldn't have this issue. Maybe this isn't worth
worrying about; I'm not sure. Are there any other cases where the
transformation can produce something that executes more slowly?
I have a couple of user reports in my pocket where changing the position of the OR clause drastically (2-3 times) altered query execution time. However, I think it is not a good way to optimise SQL queries the way we use when coding in C.

--
regards, Andrei Lepikhov
From ae279c2bef8b0b11739f79b49d68319894c63aa9 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepi...@gmail.com>
Date: Wed, 9 Oct 2024 14:05:25 +0700
Subject: [PATCH 1/3] Comments for the 0001 patch

---
 src/backend/optimizer/path/indxpath.c | 52 ++++++++++++++++-----------
 1 file changed, 31 insertions(+), 21 deletions(-)

diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 46d576a0c2..602141911d 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -2557,6 +2557,10 @@ match_clause_to_index(PlannerInfo *root,
  *	  It is also possible to match ScalarArrayOpExpr clauses to indexes, when
  *	  the clause is of the form "indexkey op ANY (arrayconst)".
  *
+ *	  It is also possible to match a list of OR clauses if it may be transformed
+ *	  into single ScalarArrayOpExpr clause. On success, returning index clause
+ *	  will contain trasformed clause.
+ *
  *	  For boolean indexes, it is also possible to match the clause directly
  *	  to the indexkey; or perhaps the clause is (NOT indexkey).
  *
@@ -3157,6 +3161,10 @@ match_rowcompare_to_indexcol(PlannerInfo *root,
  * match_orclause_to_indexcol()
  *	  Handles the OR-expr case for match_clause_to_indexcol() in the case
  *	  when it could be transformed to ScalarArrayOpExpr.
+ *
+ *	  Given a list of an OR-clause args, attempts to transform this BoolExpr
+ *	  into single SAOP expression. On success, returns IndexClause, containing
+ *	  transformed expression or NULL, if failed.
  */
 static IndexClause *
 match_orclause_to_indexcol(PlannerInfo *root,
@@ -3184,11 +3192,12 @@ match_orclause_to_indexcol(PlannerInfo *root,
 	Assert(orclause->boolop == OR_EXPR);
 
 	/*
-	 * Iterate over OR entries.  Check that each OR entry is of the form:
-	 * (indexkey operator constant) or (constant operator indexkey). Operators
-	 * of all the entries must match.  Constant might be either Const or
-	 * Param.  Exit with NULL on first non-matching entry.  Exit is
-	 * implemented as a break from the loop, which is catched afterwards.
+	 * Try to convert list of OR-clauses to single SAOP expression. Each OR
+	 * entry must be in the form: (indexkey operator constant) or (constant
+	 * operator indexkey). Operators of all the entries must match. Constant
+	 * might be either Const or Param. To be effective, give up on first
+	 * non-matching entry. Exit is implemented as a break from the loop, which
+	 * is catched afterwards.
 	 */
 	foreach(lc, orclause->args)
 	{
@@ -3286,8 +3295,8 @@ match_orclause_to_indexcol(PlannerInfo *root,
 			break;
 
 		/*
-		 * For the first matching qual, save information about operator, type
-		 * and collation.  For the other quals just check the match with the
+		 * Save information about the operator, type, and collation for the
+		 * first matching qual. Then, check that subsequent quals match the
 		 * first.
 		 */
 		if (firstTime)
@@ -3298,9 +3307,9 @@ match_orclause_to_indexcol(PlannerInfo *root,
 			inputcollid = subClause->inputcollid;
 
 			/*
-			 * Check operator is present in the opfamily, expression collation
-			 * matches index collation.  Also, there must be an array type in
-			 * order to construct an array later.
+			 * Check operator is presented in the opfamily and expression
+			 * collation matches index collation. Also, there must be an array
+			 * type to construct an array later.
 			 */
 			if (!IndexCollMatchesExprColl(index->indexcollations[indexcol], inputcollid) ||
 				!op_in_opfamily(matchOpno, index->opfamily[indexcol]) ||
@@ -3332,12 +3341,14 @@ match_orclause_to_indexcol(PlannerInfo *root,
 		return NULL;
 	}
 
+	/*
+	 * Assemble an array from the list of constants. It seems more profitable to
+	 * build a const array. But in presence of parameters we don't have specific
+	 * value here and must employ an ArrayExpr instead.
+	 */
+
 	if (have_param)
 	{
-		/*
-		 * We need to construct an ArrayExpr given we have Param's not just
-		 * Const's.
-		 */
 		ArrayExpr  *arrayExpr = makeNode(ArrayExpr);
 
 		/* array_collid will be set by parse_collate.c */
@@ -3351,10 +3362,6 @@ match_orclause_to_indexcol(PlannerInfo *root,
 	}
 	else
 	{
-		/*
-		 * We have only Const's.  In this case we can construct an array
-		 * directly.
-		 */
 		int16		typlen;
 		bool		typbyval;
 		char		typalign;
@@ -3383,8 +3390,7 @@ match_orclause_to_indexcol(PlannerInfo *root,
 	}
 
 	/* Lookup for operator to fetch necessary information for the SAOP node */
-	opertup = SearchSysCache1(OPEROID,
-							  ObjectIdGetDatum(matchOpno));
+	opertup = SearchSysCache1(OPEROID, ObjectIdGetDatum(matchOpno));
 	if (!HeapTupleIsValid(opertup))
 		elog(ERROR, "cache lookup failed for operator %u", matchOpno);
 
@@ -3404,7 +3410,11 @@ match_orclause_to_indexcol(PlannerInfo *root,
 	ReleaseSysCache(opertup);
 
 	/*
-	 * Finally build an IndexClause based on the SAOP node.
+	 * Finally build an IndexClause based on the SAOP node. Use
+	 * make_simple_restrictinfo to get RestrictInfo with clean selectivity
+	 * estimations because it may differ from estimation made for an OR clause.
+	 * Although it is not a lossy expression, keep old version of rinfo in
+	 * iclause->rinfo to detect duplicates and recheck original clause.
 	 */
 	iclause = makeNode(IndexClause);
 	iclause->rinfo = rinfo;
-- 
2.39.5

From d0dfb0641ec100aaff2c3a7cf0cefa2e55be7beb Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepi...@gmail.com>
Date: Wed, 9 Oct 2024 15:42:54 +0700
Subject: [PATCH 2/3] Comments for 0002 patch

---
 src/backend/optimizer/path/indxpath.c | 25 ++++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 602141911d..a2ddfb2bb9 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -1174,9 +1174,10 @@ build_paths_for_OR(PlannerInfo *root, RelOptInfo *rel,
 }
 
 /*
- * Data structure representing information about OR-clause argument and its
- * matching index key.  Used for grouping of similar OR-clause arguments in
+ * Utility structure used to group similar OR-clause arguments in
  * group_similar_or_args().
+ * It represents information about OR-clause argument and its
+ * matching index key.
  */
 typedef struct
 {
@@ -1228,12 +1229,17 @@ or_arg_index_match_cmp(const void *a, const void *b)
 
 /*
  * group_similar_or_args
- *		Group similar OR-arguments into dedicated RestrictInfos.  Process
- *		arguments of 'rinfo' clause, returns the processed list of arguments.
+ *		Transform incoming OR-restrictinfo into a list of sub-restrictinfos,
+ *		each of them contain a subset of OR-clauses from the source rinfo
+ *		matching the same index column with the same operator and collation,
+ *		It may be employed later, during the match_clause_to_indexcol to
+ *		transform whole OR-sub-rinfo to a SAOP clause.
  *
  * Similar arguments clauses of form "indexkey op constant" having same
  * indexkey, operator, and collation.  Constant may comprise either Const
  * or Param.
+ *
+ * Returns the processed list of arguments.
  */
 static List *
 group_similar_or_args(PlannerInfo *root, RelOptInfo *rel, RestrictInfo *rinfo)
@@ -1252,8 +1258,9 @@ group_similar_or_args(PlannerInfo *root, RelOptInfo *rel, RestrictInfo *rinfo)
 	n = list_length(orargs);
 
 	/*
-	 * Allocate and fill OrArgIndexMatch struct for each clause in the
-	 * argument list.
+	 * To avoid N^2 behaviour, take utility pass along the list of OR-clause
+	 * arguments. For each argument fill the OrArgIndexMatch structure which
+	 * will be used to sort these arguments at the next step.
 	 */
 	i = -1;
 	matches = (OrArgIndexMatch *) palloc(sizeof(OrArgIndexMatch) * n);
@@ -1368,7 +1375,11 @@ group_similar_or_args(PlannerInfo *root, RelOptInfo *rel, RestrictInfo *rinfo)
 	/* Sort clauses to make similar clauses go together */
 	qsort(matches, n, sizeof(OrArgIndexMatch), or_arg_index_match_cmp);
 
-	/* Group similar clauses into */
+	/*
+	 * Group similar clauses into single sub-restrictinfo.
+	 * Side effect: resulting list of restrictions will be sorted by indexnum
+	 * and colnum. Can we employ it somehow later?
+	 */
 	group_start = 0;
 	for (i = 1; i <= n; i++)
 	{
-- 
2.39.5

From fabb970f0acdb1bb3b7280f05ffce9facf89af44 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepi...@gmail.com>
Date: Wed, 9 Oct 2024 16:17:27 +0700
Subject: [PATCH 3/3] Comment on restriction of OR->SAOP element type

---
 src/backend/optimizer/path/indxpath.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index a2ddfb2bb9..c8625a32d4 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -2538,7 +2538,10 @@ match_clause_to_index(PlannerInfo *root,
  *	  (3)  must match the collation of the index, if collation is relevant.
  *
  *	  Our definition of "const" is exceedingly liberal: we allow anything that
- *	  doesn't involve a volatile function or a Var of the index's relation.
+ *	  doesn't involve a volatile function or a Var of the index's relation
+ *	  except for a boolean OR expression input: due to trade-off between
+ *	  expected execution speedup planning complexity, we limit or->saop
+ *	  transformation by obvious cases when an index scan can get a profit.
  *	  In particular, Vars belonging to other relations of the query are
  *	  accepted here, since a clause of that form can be used in a
  *	  parameterized indexscan.  It's the responsibility of higher code levels
-- 
2.39.5

Reply via email to