On Tue, Mar 25, 2025 at 12:58 PM Amit Langote <amitlangot...@gmail.com> wrote:
>
> -        /* Updates ec1's ec_derives_list and ec_derives_hash if present. */
> +        /*
> +         * Appends ec2's derived clauses to ec1->ec_derives_list and adds
> +         * them to ec1->ec_derives_hash if present.
> +         */

WFM.

>
> > @@ -396,7 +400,7 @@ process_equivalence(PlannerInfo *root,
> > /* just to avoid debugging confusion w/ dangling pointers: */
> > ec2->ec_members = NIL;
> > ec2->ec_sources = NIL;
> > - clear_ec_derived_clauses(ec2);
> > + ec_clear_derived_clauses(ec2);
> >
> > I pondered about this naming convention when naming the functions. But
> > it seems it's not used everywhere in this file OR I am not able to see
> > the underlying naming rule if any. So I used a mixed naming. Let's
> > keep your names though. I think they are better.
>
> Got it -- I went with the ec_ prefix mainly to make the new additions
> self-consistent, since the file doesn’t seem to follow a strict naming
> pattern. Glad the names work for you. While at it, I also applied the
> same naming convention to two new functions I hadn’t touched earlier
> for some reason.

WFM.


>
> + * Derived equality clauses are stored in ec_derives_list. For small queries,
> + * this list is scanned directly during lookup. For larger queries -- e.g.,
> + * with many partitions or joins -- a hash table (ec_derives_hash) is built
> + * when the list grows beyond a threshold, for faster lookup. When present,
> + * the hash table contains the same RestrictInfos and is maintained alongside
> + * the list. We retain the list even when the hash is used to simplify
> + * serialization (e.g., in _outEquivalenceClass()) and support
> + * EquivalenceClass merging.
>
> I've merged my delta + your suggested changes as discussed above into 0002.
>

LGTM.

> Btw, about ec_clear_derived_clauses():
>
> @@ -749,7 +749,7 @@ remove_rel_from_eclass(EquivalenceClass *ec,
> SpecialJoinInfo *sjinfo,
>       * drop them.  (At this point, any such clauses would be base restriction
>       * clauses, which we'd not need anymore anyway.)
>       */
> -    ec->ec_derives = NIL;
> +    ec_clear_derived_clauses(ec);
>  }
>
>  /*
> @@ -1544,8 +1544,7 @@ update_eclasses(EquivalenceClass *ec, int from, int to)
>      list_free(ec->ec_members);
>      ec->ec_members = new_members;
>
> -    list_free(ec->ec_derives);
> -    ec->ec_derives = NULL;
> +    ec_clear_derived_clauses(ec);
>
> We're losing that list_free() in the second hunk, aren't we?
>
> There's also this comment:
>
> + * XXX: When thousands of partitions are involved, the list can become
> + * sizable. It might be worth freeing it explicitly in such cases.
>
> So maybe ec_clear_derived_clauses() should take a free_list parameter,
> to preserve the original behavior? What do you think?

Well spotted. How about just calling list_free() in
ec_clear_derived_clauses() to simplify things. I mean list_free()
might spend some cycles under remove_rel_from_eclass() and
process_equivalence() freeing the array but that should be ok. Just
setting it to NIL by itself looks fine. If we bundle it in a function
with a flag, we will need to explain why/when to free list and when to
not. That's unnecessary complexity I feel. In other places where the
structures have potential to grow in size, we have resorted to freeing
them rather than just forgetting them. For example, we free appinfos
in try_partitionwise_join() or child_relids.

The list shouldn't be referenced anywhere else, so it should be safe
to free it. Note that I thought list_concat() used by
process_equivalence() would reuse the memory allocated to
ec2->ec_derives_list but it doesn't. I verified that by setting the
threshold to 0, thus forcing the hash table always and running a
regression suite. It runs without any segfaults. I don't see any
change in time required to run regression.

PFA patchset
0001, 0002 are same as your patchset except some of my edits to the
commit message. Please feel free to accept or reject the edits.
0003 adds list_free() to ec_clear_derived_clauses()
--
Best Wishes,
Ashutosh Bapat
From 3196b8f306f8ac647c86d4acc945798ef6118c2e Mon Sep 17 00:00:00 2001
From: Amit Langote <amit...@postgresql.org>
Date: Mon, 24 Mar 2025 21:18:02 +0900
Subject: [PATCH 2/3] Make derived clause lookup in EquivalenceClass more
 efficient

Previously, derived clauses were stored in ec_derives, a List of
RestrictInfos. These clauses are looked up later using the left and
right EquivalenceMembers and the clause's parent EC, typically during
join clause generation.

This lookup becomes expensive in queries with many partitions or
joins, where ec_derives may contain thousands of entries. In
particular, create_join_clause() can spend significant time scanning
this list.

To improve performance, a hash table (ec_derives_hash) is now built
when ec_derives_list exceeds 32 entries. This is the same threshold used
for join_rel_hash. To reflect this dual structure, the ec_derives
field is renamed to ec_derives_list. The list is retained alongside
the hash table to support _outEquivalenceClass() and
EquivalenceClass merging, which both operate on ec_derives_list.

Lookup logic is updated to consult the hash table when available. For
ECs with a constant EM, lookups are done by the non-constant EM alone;
the constant EM is assumed to be unique, and on the RHS of the clause.
It may not even be available at lookup time. In such cases, the hash
table key uses NULL instead of constant EM.  An assertion originally
placed in find_derived_clause_for_ec_member() is moved into
ec_search_derived_clause_for_ems() to enforce the same invariant in both
hash and list lookup paths.

Author: Ashutosh Bapat <ashutosh.bapat....@gmail.com>
Reviewed-by: Alvaro Herrera <alvhe...@alvh.no-ip.org>
Reviewed-by: Amit Langote <amitlangot...@gmail.com>
Tested-by: Dmitry Dolgov <9erthali...@gmail.com>
Tested-by: Alvaro Herrera <alvhe...@alvh.no-ip.org>
Tested-by: Amit Langote <amitlangot...@gmail.com>
Discussion: https://postgr.es/m/caexhw5vziqtwu6moszlp5iz8glx_zaubgex0dxglx9pgwct...@mail.gmail.com
---
 src/backend/nodes/outfuncs.c              |   3 +-
 src/backend/optimizer/path/costsize.c     |   3 +-
 src/backend/optimizer/path/equivclass.c   | 403 ++++++++++++++++++----
 src/backend/optimizer/plan/analyzejoins.c |   5 +-
 src/include/nodes/pathnodes.h             |  17 +-
 src/include/optimizer/paths.h             |   4 +-
 src/tools/pgindent/typedefs.list          |   2 +
 7 files changed, 357 insertions(+), 80 deletions(-)

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index bb9bdd67192..557f06e344f 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -467,7 +467,8 @@ _outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
 	WRITE_OID_FIELD(ec_collation);
 	WRITE_NODE_FIELD(ec_members);
 	WRITE_NODE_FIELD(ec_sources);
-	WRITE_NODE_FIELD(ec_derives);
+	/* Only ec_derives_list is written; hash is not serialized. */
+	WRITE_NODE_FIELD(ec_derives_list);
 	WRITE_BITMAPSET_FIELD(ec_relids);
 	WRITE_BOOL_FIELD(ec_has_const);
 	WRITE_BOOL_FIELD(ec_has_volatile);
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index f6f77b8fe19..c474c7a06af 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -5848,7 +5848,8 @@ get_foreign_key_join_selectivity(PlannerInfo *root,
 				if (ec && ec->ec_has_const)
 				{
 					EquivalenceMember *em = fkinfo->fk_eclass_member[i];
-					RestrictInfo *rinfo = find_derived_clause_for_ec_member(ec,
+					RestrictInfo *rinfo = find_derived_clause_for_ec_member(root,
+																			ec,
 																			em);
 
 					if (rinfo)
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 493a95d26cc..b4987b466c4 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -20,6 +20,7 @@
 
 #include "access/stratnum.h"
 #include "catalog/pg_type.h"
+#include "common/hashfn.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/appendinfo.h"
@@ -72,7 +73,50 @@ static Bitmapset *get_eclass_indexes_for_relids(PlannerInfo *root,
 												Relids relids);
 static Bitmapset *get_common_eclass_indexes(PlannerInfo *root, Relids relids1,
 											Relids relids2);
+static void ec_build_derives_hash(PlannerInfo *root, EquivalenceClass *ec);
+static void ec_add_derived_clauses(EquivalenceClass *ec, List *clauses);
+static void ec_add_derived_clause(EquivalenceClass *ec, RestrictInfo *clause);
+static void ec_add_clause_to_derives_hash(EquivalenceClass *ec, RestrictInfo *rinfo);
+static RestrictInfo *ec_search_clause_for_ems(PlannerInfo *root, EquivalenceClass *ec,
+											  EquivalenceMember *leftem,
+											  EquivalenceMember *rightem,
+											  EquivalenceClass *parent_ec);
+static RestrictInfo *ec_search_derived_clause_for_ems(PlannerInfo *root,
+													  EquivalenceClass *ec,
+													  EquivalenceMember *leftem,
+													  EquivalenceMember *rightem,
+													  EquivalenceClass *parent_ec);
 
+/*
+ * Hash key identifying a derived clause by EquivalenceMembers and parent EC.
+ */
+typedef struct
+{
+	EquivalenceMember *em1;
+	EquivalenceMember *em2;
+	EquivalenceClass *parent_ec;
+} ECDerivesKey;
+
+/* Hash table entry in ec_derives_hash. */
+typedef struct
+{
+	uint32		status;
+	ECDerivesKey key;
+	RestrictInfo *rinfo;
+} ECDerivesEntry;
+
+#define SH_PREFIX               derives
+#define SH_ELEMENT_TYPE			ECDerivesEntry
+#define SH_KEY_TYPE             ECDerivesKey
+#define SH_KEY                  key
+#define SH_HASH_KEY(tb, key)	\
+	hash_bytes((const unsigned char *) &(key), sizeof(ECDerivesKey))
+#define SH_EQUAL(tb, a, b)	\
+	((a).em1 == (b).em1 && (a).em2 == (b).em2 && (a).parent_ec == (b).parent_ec)
+#define SH_SCOPE                static inline
+#define SH_DECLARE
+#define SH_DEFINE
+#include "lib/simplehash.h"
 
 /*
  * process_equivalence
@@ -342,7 +386,12 @@ process_equivalence(PlannerInfo *root,
 		 */
 		ec1->ec_members = list_concat(ec1->ec_members, ec2->ec_members);
 		ec1->ec_sources = list_concat(ec1->ec_sources, ec2->ec_sources);
-		ec1->ec_derives = list_concat(ec1->ec_derives, ec2->ec_derives);
+
+		/*
+		 * Appends ec2's derived clauses to ec1->ec_derives_list and adds them
+		 * to ec1->ec_derives_hash if present.
+		 */
+		ec_add_derived_clauses(ec1, ec2->ec_derives_list);
 		ec1->ec_relids = bms_join(ec1->ec_relids, ec2->ec_relids);
 		ec1->ec_has_const |= ec2->ec_has_const;
 		/* can't need to set has_volatile */
@@ -355,7 +404,7 @@ process_equivalence(PlannerInfo *root,
 		/* just to avoid debugging confusion w/ dangling pointers: */
 		ec2->ec_members = NIL;
 		ec2->ec_sources = NIL;
-		ec2->ec_derives = NIL;
+		ec_clear_derived_clauses(ec2);
 		ec2->ec_relids = NULL;
 		ec1->ec_sources = lappend(ec1->ec_sources, restrictinfo);
 		ec1->ec_min_security = Min(ec1->ec_min_security,
@@ -412,7 +461,8 @@ process_equivalence(PlannerInfo *root,
 		ec->ec_collation = collation;
 		ec->ec_members = NIL;
 		ec->ec_sources = list_make1(restrictinfo);
-		ec->ec_derives = NIL;
+		ec->ec_derives_list = NIL;
+		ec->ec_derives_hash = NULL;
 		ec->ec_relids = NULL;
 		ec->ec_has_const = false;
 		ec->ec_has_volatile = false;
@@ -671,7 +721,8 @@ get_eclass_for_sort_expr(PlannerInfo *root,
 	newec->ec_collation = collation;
 	newec->ec_members = NIL;
 	newec->ec_sources = NIL;
-	newec->ec_derives = NIL;
+	newec->ec_derives_list = NIL;
+	newec->ec_derives_hash = NULL;
 	newec->ec_relids = NULL;
 	newec->ec_has_const = false;
 	newec->ec_has_volatile = contain_volatile_functions((Node *) expr);
@@ -1026,8 +1077,8 @@ relation_can_be_sorted_early(PlannerInfo *root, RelOptInfo *rel,
  * scanning of the quals and before Path construction begins.
  *
  * We make no attempt to avoid generating duplicate RestrictInfos here: we
- * don't search ec_sources or ec_derives for matches.  It doesn't really
- * seem worth the trouble to do so.
+ * don't search existing source or derived clauses in the EC for matches.  It
+ * doesn't really seem worth the trouble to do so.
  */
 void
 generate_base_implied_equalities(PlannerInfo *root)
@@ -1188,11 +1239,11 @@ generate_base_implied_equalities_const(PlannerInfo *root,
 
 		/*
 		 * If the clause didn't degenerate to a constant, fill in the correct
-		 * markings for a mergejoinable clause, and save it in ec_derives. (We
-		 * will not re-use such clauses directly, but selectivity estimation
-		 * may consult the list later.  Note that this use of ec_derives does
-		 * not overlap with its use for join clauses, since we never generate
-		 * join clauses from an ec_has_const eclass.)
+		 * markings for a mergejoinable clause, and save it as a derived
+		 * clause. (We will not re-use such clauses directly, but selectivity
+		 * estimation may consult those later.  Note that this use of derived
+		 * clauses does not overlap with its use for join clauses, since we
+		 * never generate join clauses from an ec_has_const eclass.)
 		 */
 		if (rinfo && rinfo->mergeopfamilies)
 		{
@@ -1200,7 +1251,7 @@ generate_base_implied_equalities_const(PlannerInfo *root,
 			rinfo->left_ec = rinfo->right_ec = ec;
 			rinfo->left_em = cur_em;
 			rinfo->right_em = const_em;
-			ec->ec_derives = lappend(ec->ec_derives, rinfo);
+			ec_add_derived_clause(ec, rinfo);
 		}
 	}
 }
@@ -1265,10 +1316,10 @@ generate_base_implied_equalities_no_const(PlannerInfo *root,
 
 			/*
 			 * If the clause didn't degenerate to a constant, fill in the
-			 * correct markings for a mergejoinable clause.  We don't put it
-			 * in ec_derives however; we don't currently need to re-find such
-			 * clauses, and we don't want to clutter that list with non-join
-			 * clauses.
+			 * correct markings for a mergejoinable clause.  We don't record
+			 * it as a derived clause, since we don't currently need to
+			 * re-find such clauses, and don't want to clutter the
+			 * derived-clause set with non-join clauses.
 			 */
 			if (rinfo && rinfo->mergeopfamilies)
 			{
@@ -1369,7 +1420,7 @@ generate_base_implied_equalities_broken(PlannerInfo *root,
  * we consider different join paths, we avoid generating multiple copies:
  * whenever we select a particular pair of EquivalenceMembers to join,
  * we check to see if the pair matches any original clause (in ec_sources)
- * or previously-built clause (in ec_derives).  This saves memory and allows
+ * or previously-built derived clause.  This saves memory and allows
  * re-use of information cached in RestrictInfos.  We also avoid generating
  * commutative duplicates, i.e. if the algorithm selects "a.x = b.y" but
  * we already have "b.y = a.x", we return the existing clause.
@@ -1754,9 +1805,9 @@ generate_join_implied_equalities_broken(PlannerInfo *root,
 	/*
 	 * If we have to translate, just brute-force apply adjust_appendrel_attrs
 	 * to all the RestrictInfos at once.  This will result in returning
-	 * RestrictInfos that are not listed in ec_derives, but there shouldn't be
-	 * any duplication, and it's a sufficiently narrow corner case that we
-	 * shouldn't sweat too much over it anyway.
+	 * RestrictInfos that are not included in EC's derived clauses, but there
+	 * shouldn't be any duplication, and it's a sufficiently narrow corner
+	 * case that we shouldn't sweat too much over it anyway.
 	 *
 	 * Since inner_rel might be an indirect descendant of the baserel
 	 * mentioned in the ec_sources clauses, we have to be prepared to apply
@@ -1823,43 +1874,11 @@ create_join_clause(PlannerInfo *root,
 {
 	RestrictInfo *rinfo;
 	RestrictInfo *parent_rinfo = NULL;
-	ListCell   *lc;
 	MemoryContext oldcontext;
 
-	/*
-	 * Search to see if we already built a RestrictInfo for this pair of
-	 * EquivalenceMembers.  We can use either original source clauses or
-	 * previously-derived clauses, and a commutator clause is acceptable.
-	 *
-	 * We used to verify that opno matches, but that seems redundant: even if
-	 * it's not identical, it'd better have the same effects, or the operator
-	 * families we're using are broken.
-	 */
-	foreach(lc, ec->ec_sources)
-	{
-		rinfo = (RestrictInfo *) lfirst(lc);
-		if (rinfo->left_em == leftem &&
-			rinfo->right_em == rightem &&
-			rinfo->parent_ec == parent_ec)
-			return rinfo;
-		if (rinfo->left_em == rightem &&
-			rinfo->right_em == leftem &&
-			rinfo->parent_ec == parent_ec)
-			return rinfo;
-	}
-
-	foreach(lc, ec->ec_derives)
-	{
-		rinfo = (RestrictInfo *) lfirst(lc);
-		if (rinfo->left_em == leftem &&
-			rinfo->right_em == rightem &&
-			rinfo->parent_ec == parent_ec)
-			return rinfo;
-		if (rinfo->left_em == rightem &&
-			rinfo->right_em == leftem &&
-			rinfo->parent_ec == parent_ec)
-			return rinfo;
-	}
+	rinfo = ec_search_clause_for_ems(root, ec, leftem, rightem, parent_ec);
+	if (rinfo)
+		return rinfo;
 
 	/*
 	 * Not there, so build it, in planner context so we can re-use it. (Not
@@ -1923,7 +1942,7 @@ create_join_clause(PlannerInfo *root,
 	rinfo->left_em = leftem;
 	rinfo->right_em = rightem;
 	/* and save it for possible re-use */
-	ec->ec_derives = lappend(ec->ec_derives, rinfo);
+	ec_add_derived_clause(ec, rinfo);
 
 	MemoryContextSwitchTo(oldcontext);
 
@@ -2648,28 +2667,14 @@ match_eclasses_to_foreign_key_col(PlannerInfo *root,
  * Returns NULL if no such clause can be found.
  */
 RestrictInfo *
-find_derived_clause_for_ec_member(EquivalenceClass *ec,
+find_derived_clause_for_ec_member(PlannerInfo *root,
+								  EquivalenceClass *ec,
 								  EquivalenceMember *em)
 {
-	ListCell   *lc;
-
 	Assert(ec->ec_has_const);
 	Assert(!em->em_is_const);
-	foreach(lc, ec->ec_derives)
-	{
-		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
 
-		/*
-		 * generate_base_implied_equalities_const will have put non-const
-		 * members on the left side of derived clauses.
-		 */
-		if (rinfo->left_em == em)
-		{
-			Assert(rinfo->right_em->em_is_const);
-			return rinfo;
-		}
-	}
-	return NULL;
+	return ec_search_derived_clause_for_ems(root, ec, em, NULL, NULL);
 }
 
 
@@ -3442,3 +3447,255 @@ get_common_eclass_indexes(PlannerInfo *root, Relids relids1, Relids relids2)
 	/* Calculate and return the common EC indexes, recycling the left input. */
 	return bms_int_members(rel1ecs, rel2ecs);
 }
+
+/*
+ * ec_build_derives_hash
+ *	  Construct the auxiliary hash table for derived clauses.
+ */
+static void
+ec_build_derives_hash(PlannerInfo *root, EquivalenceClass *ec)
+{
+	Assert(!ec->ec_derives_hash);
+
+	/* Create the hash table */
+	ec->ec_derives_hash = derives_create(root->planner_cxt, 256L, NULL);
+
+	foreach_node(RestrictInfo, rinfo, ec->ec_derives_list)
+		ec_add_clause_to_derives_hash(ec, rinfo);
+}
+
+/*
+ * ec_add_derived_clause
+ *		Add a clause to the set of derived clauses for the given
+ *		EquivalenceClass. Always appends to ec_derives_list; also adds
+ *		to ec_derives_hash if it exists.
+ */
+static void
+ec_add_derived_clause(EquivalenceClass *ec, RestrictInfo *clause)
+{
+	ec->ec_derives_list = lappend(ec->ec_derives_list, clause);
+	if (ec->ec_derives_hash)
+		ec_add_clause_to_derives_hash(ec, clause);
+}
+
+/*
+ * ec_add_derived_clauses
+ *		Add a list of clauses to the set of clauses derived from the given
+ *		EquivalenceClass; adding to the list and hash table if needed.
+ *
+ * This function is similar to above function but optimized for adding multiple
+ * clauses at a time to the ec_derives_list.
+ */
+static void
+ec_add_derived_clauses(EquivalenceClass *ec, List *clauses)
+{
+	ec->ec_derives_list = list_concat(ec->ec_derives_list, clauses);
+	if (ec->ec_derives_hash)
+		foreach_node(RestrictInfo, rinfo, clauses)
+			ec_add_clause_to_derives_hash(ec, rinfo);
+}
+
+/*
+ * ec_add_clause_to_derives_hash
+ *      Add a derived clause to the ec_derives_hash of the given
+ *      EquivalenceClass.
+ */
+static void
+ec_add_clause_to_derives_hash(EquivalenceClass *ec, RestrictInfo *rinfo)
+{
+	ECDerivesKey key;
+	ECDerivesEntry *entry;
+	bool		found;
+
+	/*
+	 * Constants are always placed on the RHS; see
+	 * generate_base_implied_equalities_const().
+	 */
+	Assert(!rinfo->left_em->em_is_const);
+
+	if (rinfo->right_em->em_is_const)
+	{
+		/*
+		 * Clauses containing a constant are never considered redundant, so
+		 * parent_ec is not set.
+		 */
+		Assert(!rinfo->parent_ec);
+
+		/*
+		 * find_derived_clause_for_ec_member() performs lookup of a clause
+		 * involving a constant using only the non-constant EM and NULL for
+		 * the RHS. Since there's only one constant EM per EC, we don't need
+		 * to store or match it during lookup.  We set key.em2 = NULL to
+		 * reflect this.
+		 */
+		key.em1 = rinfo->left_em;
+		key.em2 = NULL;
+		key.parent_ec = rinfo->parent_ec;
+		entry = derives_insert(ec->ec_derives_hash, key, &found);
+		Assert(!found);
+		entry->rinfo = rinfo;
+	}
+	else
+	{
+		key.em1 = rinfo->left_em;
+		key.em2 = rinfo->right_em;
+		key.parent_ec = rinfo->parent_ec;
+		entry = derives_insert(ec->ec_derives_hash, key, &found);
+		Assert(!found);
+		entry->rinfo = rinfo;
+
+		/*
+		 * Insert the clause under the given EM pair key, and also under the
+		 * reverse order. This ensures we can find the clause regardless of
+		 * the order in which EMs are passed to the lookup function.
+		 */
+		key.em1 = rinfo->right_em;
+		key.em2 = rinfo->left_em;
+		key.parent_ec = rinfo->parent_ec;
+		entry = derives_insert(ec->ec_derives_hash, key, &found);
+		Assert(!found);
+		entry->rinfo = rinfo;
+	}
+}
+
+/*
+ * ec_clear_derived_clauses
+ *      Reset ec_derives_list and ec_derives_hash.
+ *
+ * We destroy the hash table explicitly, since it may consume significant
+ * space. The list is simply cleared by setting it to NIL; we do not
+ * explicitly free it.
+ *
+ * XXX: When thousands of partitions are involved, the list can become
+ * sizable. It might be worth freeing it explicitly in such cases.
+ */
+void
+ec_clear_derived_clauses(EquivalenceClass *ec)
+{
+	ec->ec_derives_list = NIL;
+	if (ec->ec_derives_hash)
+	{
+		derives_destroy(ec->ec_derives_hash);
+		ec->ec_derives_hash = NULL;
+	}
+}
+
+/*
+ * ec_search_clause_for_ems
+ *		Search for an existing RestrictInfo that equates the given pair
+ *		of EquivalenceMembers, either from ec_sources or ec_derives.
+ *
+ * We accept clauses in either operand order, so commutators are matched. We
+ * used to require matching operator OIDs, but dropped that since any
+ * semantically different operator here would indicate a broken operator
+ * family.
+ *
+ * Returns NULL if no matching clause is found.
+ */
+
+static RestrictInfo *
+ec_search_clause_for_ems(PlannerInfo *root, EquivalenceClass *ec,
+						 EquivalenceMember *leftem, EquivalenceMember *rightem,
+						 EquivalenceClass *parent_ec)
+{
+	/* Check original source clauses */
+	foreach_node(RestrictInfo, rinfo, ec->ec_sources)
+	{
+		if (rinfo->left_em == leftem &&
+			rinfo->right_em == rightem &&
+			rinfo->parent_ec == parent_ec)
+			return rinfo;
+		if (rinfo->left_em == rightem &&
+			rinfo->right_em == leftem &&
+			rinfo->parent_ec == parent_ec)
+			return rinfo;
+	}
+
+	/* Not found in ec_sources; search derived clauses */
+	return ec_search_derived_clause_for_ems(root, ec, leftem, rightem,
+											parent_ec);
+}
+
+/*
+ * ec_search_derived_clause_for_ems
+ *		Search for an existing derived clause between two EquivalenceMembers.
+ *
+ * If the number of derived clauses exceeds a threshold, switch to hash table
+ * lookup; otherwise, scan ec_derives_list linearly.
+ *
+ * Clauses involving constants are looked up using only the non-const EM
+ * (leftem) and a NULL rightem. In that case, we expect to find a clause with
+ * a constant on the RHS.
+ *
+ * We do not attempt a second lookup with EMs swapped when using the hash
+ * table; such clauses are inserted under both orderings at the time of
+ * insertion.
+ */
+static RestrictInfo *
+ec_search_derived_clause_for_ems(PlannerInfo *root, EquivalenceClass *ec,
+								 EquivalenceMember *leftem,
+								 EquivalenceMember *rightem,
+								 EquivalenceClass *parent_ec)
+{
+	/*
+	 * Switch to using hash lookup when list grows "too long". The threshold
+	 * is arbitrary and is known only here.
+	 */
+	if (!ec->ec_derives_hash && list_length(ec->ec_derives_list) >= 32)
+		ec_build_derives_hash(root, ec);
+
+	/* Perform hash table lookup if available */
+	if (ec->ec_derives_hash)
+	{
+		ECDerivesKey key;
+		RestrictInfo *rinfo;
+		ECDerivesEntry *entry;
+
+		/*
+		 * See ec_add_clause_to_derives_hash() for rationale: derived clauses
+		 * are inserted into the hash table under both (em1, em2) and (em2,
+		 * em1), so a single lookup with the original order is sufficient.
+		 */
+		key.em1 = leftem;
+		key.em2 = rightem;
+		key.parent_ec = parent_ec;
+		entry = derives_lookup(ec->ec_derives_hash, key);
+		if (entry)
+		{
+			rinfo = entry->rinfo;
+			Assert(rinfo);
+
+			/*
+			 * If this is a lookup in a const-containing EC, the RHS must be a
+			 * constant. The caller signals this by passing NULL for rightem.
+			 */
+			Assert(rightem || rinfo->right_em->em_is_const);
+			return rinfo;
+		}
+	}
+	else
+	{
+		/* Fallback to linear search over ec_derives_list */
+		foreach_node(RestrictInfo, rinfo, ec->ec_derives_list)
+		{
+			/* Handle special case: lookup by non-const EM alone */
+			if (!rightem &&
+				rinfo->left_em == leftem)
+			{
+				/* See the comment above in hash path for rationale. */
+				Assert(rinfo->right_em->em_is_const);
+				return rinfo;
+			}
+			if (rinfo->left_em == leftem &&
+				rinfo->right_em == rightem &&
+				rinfo->parent_ec == parent_ec)
+				return rinfo;
+			if (rinfo->left_em == rightem &&
+				rinfo->right_em == leftem &&
+				rinfo->parent_ec == parent_ec)
+				return rinfo;
+		}
+	}
+
+	return NULL;
+}
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 8a8d4a2af33..ae20691ca91 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -749,7 +749,7 @@ remove_rel_from_eclass(EquivalenceClass *ec, SpecialJoinInfo *sjinfo,
 	 * drop them.  (At this point, any such clauses would be base restriction
 	 * clauses, which we'd not need anymore anyway.)
 	 */
-	ec->ec_derives = NIL;
+	ec_clear_derived_clauses(ec);
 }
 
 /*
@@ -1544,8 +1544,7 @@ update_eclasses(EquivalenceClass *ec, int from, int to)
 	list_free(ec->ec_members);
 	ec->ec_members = new_members;
 
-	list_free(ec->ec_derives);
-	ec->ec_derives = NULL;
+	ec_clear_derived_clauses(ec);
 
 	/* Update EC source expressions */
 	foreach_node(RestrictInfo, rinfo, ec->ec_sources)
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index c24a1fc8514..e29d26a1814 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1403,6 +1403,18 @@ typedef struct JoinDomain
  * entry: consider SELECT random() AS a, random() AS b ... ORDER BY b,a.
  * So we record the SortGroupRef of the originating sort clause.
  *
+ * Derived equality clauses are stored in ec_derives_list. For small queries,
+ * this list is scanned directly during lookup. For larger queries -- e.g.,
+ * with many partitions or joins -- a hash table (ec_derives_hash) is built
+ * when the list grows beyond a threshold, for faster lookup. When present,
+ * the hash table contains the same RestrictInfos and is maintained alongside
+ * the list. We retain the list even when the hash is used to simplify
+ * serialization (e.g., in _outEquivalenceClass()) and support
+ * EquivalenceClass merging.
+ *
+ * In contrast, ec_sources holds equality clauses that appear directly in the
+ * query. These are typically few and do not require a hash table for lookup.
+ *
  * NB: if ec_merged isn't NULL, this class has been merged into another, and
  * should be ignored in favor of using the pointed-to class.
  *
@@ -1422,7 +1434,10 @@ typedef struct EquivalenceClass
 	Oid			ec_collation;	/* collation, if datatypes are collatable */
 	List	   *ec_members;		/* list of EquivalenceMembers */
 	List	   *ec_sources;		/* list of generating RestrictInfos */
-	List	   *ec_derives;		/* list of derived RestrictInfos */
+	List	   *ec_derives_list;	/* list of derived RestrictInfos */
+	struct derives_hash *ec_derives_hash;	/* optional hash table for fast
+											 * lookup; contains same
+											 * RestrictInfos as list */
 	Relids		ec_relids;		/* all relids appearing in ec_members, except
 								 * for child members (see below) */
 	bool		ec_has_const;	/* any pseudoconstants in ec_members? */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index bc5dfd7db41..16dc4d5ee82 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -167,7 +167,8 @@ extern bool exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2,
 extern EquivalenceClass *match_eclasses_to_foreign_key_col(PlannerInfo *root,
 														   ForeignKeyOptInfo *fkinfo,
 														   int colno);
-extern RestrictInfo *find_derived_clause_for_ec_member(EquivalenceClass *ec,
+extern RestrictInfo *find_derived_clause_for_ec_member(PlannerInfo *root,
+													   EquivalenceClass *ec,
 													   EquivalenceMember *em);
 extern void add_child_rel_equivalences(PlannerInfo *root,
 									   AppendRelInfo *appinfo,
@@ -197,6 +198,7 @@ extern bool eclass_useful_for_merging(PlannerInfo *root,
 extern bool is_redundant_derived_clause(RestrictInfo *rinfo, List *clauselist);
 extern bool is_redundant_with_indexclauses(RestrictInfo *rinfo,
 										   List *indexclauses);
+extern void ec_clear_derived_clauses(EquivalenceClass *ec);
 
 /*
  * pathkeys.c
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3fbf5a4c212..b99042653fd 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -666,6 +666,8 @@ DumpableObjectWithAcl
 DynamicFileList
 DynamicZoneAbbrev
 EC_KEY
+ECDerivesKey
+ECDerivesEntry
 EDGE
 ENGINE
 EOM_flatten_into_method
-- 
2.34.1

From 1cfe1ae4ced4a1facac540da912960428f26c6a4 Mon Sep 17 00:00:00 2001
From: Amit Langote <amit...@postgresql.org>
Date: Mon, 24 Mar 2025 21:17:46 +0900
Subject: [PATCH 1/3] Add assertion to verify derived clause has constant RHS

find_derived_clause_for_ec_member() searches for a previously-derived
clause that equates a non-constant EquivalenceMember to a constant
EquivalenceMember. It is only called for EquivalenceClasses with
ec_has_const set, and with a non-constant member as the target.

The matched clause is expected to have the non-constant member on the
left-hand side and the constant EquivalenceMember on the right.

Assert that the RHS is indeed a constant to catch violations of this
structure and enforce assumptions made by
generate_base_implied_equalities_const().

Author: Ashutosh Bapat <ashutosh.bapat....@gmail.com>
Discussion: https://postgr.es/m/caexhw5scmxyfrqofe6odmbiw2rnvbemeeca-p4w_cyueiku...@mail.gmail.com
---
 src/backend/optimizer/path/equivclass.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 0f9ecf5ee8b..493a95d26cc 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -2664,7 +2664,10 @@ find_derived_clause_for_ec_member(EquivalenceClass *ec,
 		 * members on the left side of derived clauses.
 		 */
 		if (rinfo->left_em == em)
+		{
+			Assert(rinfo->right_em->em_is_const);
 			return rinfo;
+		}
 	}
 	return NULL;
 }

base-commit: 3c86223c998280ea313480d319ec39f802453218
-- 
2.34.1

From 7b559921c64a00bd023e53f98866347fa753de33 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat....@gmail.com>
Date: Tue, 25 Mar 2025 14:38:59 +0530
Subject: [PATCH 3/3] Free ec_derives_list in ec_clear_derived_clauses()

remove_rel_from_eclass() and process_equivalence() used to set
EquivalenceClass::ec_derives to NIL when resetting the derived clauses before
that code was replaced by a call to ec_clear_derived_clauses().
update_eclasses(), on the other hand, freed the list before setting it to NIL.
That code was also replaced by ec_clear_derived_clauses(). Net effect being we
lost the list_free() under update_eclasses().

Given that ec_derives_list can grow large when there thousands of partitions or
joins, it's better to just free the list instead of just forgetting it as we do
at other places like try_partitionwise_join(). The memory used by the list being
freed should not be referenced anywhere so it's safe to do so. Any new code
which uses ec_clear_derived_clauses() will be required to make sure that the
memory is not referenced as well. Net effect is less memory consumption and
better compliance. Hence change ec_clear_derived_clauses() to free
ec_derives_list as well.

Author: Ashutosh Bapat
Reported-by: Amit Langote
---
 src/backend/optimizer/path/equivclass.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index b4987b466c4..ee314a8d5b8 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -3563,16 +3563,15 @@ ec_add_clause_to_derives_hash(EquivalenceClass *ec, RestrictInfo *rinfo)
  *      Reset ec_derives_list and ec_derives_hash.
  *
  * We destroy the hash table explicitly, since it may consume significant
- * space. The list is simply cleared by setting it to NIL; we do not
- * explicitly free it.
- *
- * XXX: When thousands of partitions are involved, the list can become
- * sizable. It might be worth freeing it explicitly in such cases.
+ * space. When thousands of partitions are involved, the list may become
+ * sizable hence free it as well, even though we do not usually free lists.
  */
 void
 ec_clear_derived_clauses(EquivalenceClass *ec)
 {
 	ec->ec_derives_list = NIL;
+	list_free(ec->ec_derives_list);
+
 	if (ec->ec_derives_hash)
 	{
 		derives_destroy(ec->ec_derives_hash);
-- 
2.34.1

Reply via email to