On Tue, Mar 26, 2024 at 02:16:03PM -0400, Tom Lane wrote:
> I did a little experimentation using the attached quick-hack C
> function, and came to the conclusion that setting up the bloom filter
> costs more or less as much as inserting 1000 or so OIDs the dumb way.
> So we definitely want a threshold that's not much less than that.

Thanks for doing this.

> So I'm now content with choosing a threshold of 1000 or 1024 or so.

Cool.

> As for the bloom filter size, I see that bloom_create does
> 
>       bitset_bytes = Min(bloom_work_mem * UINT64CONST(1024), total_elems * 2);
>       bitset_bytes = Max(1024 * 1024, bitset_bytes);
> 
> which means that any total_elems input less than 512K is disregarded
> altogether.  So I'm not sold on your "ROLES_LIST_BLOOM_THRESHOLD * 10"
> value.  Maybe it doesn't matter though.

Yeah, I wasn't sure how much to worry about this.  I figured that we might
as well set it to a reasonable estimate based on the description of the
parameter.  This description claims that the filter should work well if
this is off by a factor of 5 or more, and 50x the threshold sounded like it
ought to be good enough for anyone, so that's how I landed on 10x.  But as
you point out, this value will be disregarded altogether, and it will
continue to be ignored unless the filter implementation changes, which
seems unlikely.  If you have a different value in mind that you would
rather use, I'm fine with changing it.

> I do not like, even a little bit, your use of a static variable to
> hold the bloom filter pointer.  That code will misbehave horribly
> if we throw an error partway through the role-accumulation loop;
> the next call will try to carry on using the old filter, which would
> be wrong even if it still existed which it likely won't.  It's not
> that much worse notationally to keep it as a local variable, as I
> did in the attached.

Ah, yes, that's no good.  I fixed this in the new version.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From e5618bb6f1d8b21d527bfda5b49f75d12bf31e9e Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nat...@postgresql.org>
Date: Mon, 25 Mar 2024 23:17:08 -0500
Subject: [PATCH v4 1/1] Optimize roles_is_member_of() with a Bloom filter.

When the list of roles gathered by roles_is_member_of() grows very
large, a Bloom filter is created to help avoid some linear searches
through the list.  The threshold for creating the Bloom filter is
set arbitrarily high and may require future adjustment.

Suggested-by: Tom Lane
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/CAGvXd3OSMbJQwOSc-Tq-Ro1CAz%3DvggErdSG7pv2s6vmmTOLJSg%40mail.gmail.com
---
 src/backend/utils/adt/acl.c | 65 +++++++++++++++++++++++++++++++++++--
 1 file changed, 62 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/adt/acl.c b/src/backend/utils/adt/acl.c
index cf5d08576a..39197613c2 100644
--- a/src/backend/utils/adt/acl.c
+++ b/src/backend/utils/adt/acl.c
@@ -36,6 +36,7 @@
 #include "common/hashfn.h"
 #include "foreign/foreign.h"
 #include "funcapi.h"
+#include "lib/bloomfilter.h"
 #include "lib/qunique.h"
 #include "miscadmin.h"
 #include "utils/acl.h"
@@ -78,6 +79,13 @@ static Oid	cached_role[] = {InvalidOid, InvalidOid, InvalidOid};
 static List *cached_roles[] = {NIL, NIL, NIL};
 static uint32 cached_db_hash;
 
+/*
+ * If the list of roles gathered by roles_is_member_of() grows larger than the
+ * below threshold, a Bloom filter is created to speed up list membership
+ * checks.  This threshold is set arbitrarily high to avoid the overhead of
+ * creating the Bloom filter until it seems likely to provide a net benefit.
+ */
+#define ROLES_LIST_BLOOM_THRESHOLD 1024
 
 static const char *getid(const char *s, char *n, Node *escontext);
 static void putid(char *p, const char *s);
@@ -4918,6 +4926,50 @@ RoleMembershipCacheCallback(Datum arg, int cacheid, uint32 hashvalue)
 	cached_role[ROLERECURSE_SETROLE] = InvalidOid;
 }
 
+/*
+ * A helper function for roles_is_member_of() that provides an optimized
+ * implementation of list_append_unique_oid() via a Bloom filter.  The caller
+ * (i.e., roles_is_member_of()) is responsible for freeing bf once it is done
+ * using this function.
+ */
+static inline List *
+roles_list_append(List *roles_list, bloom_filter **bf, Oid role)
+{
+	unsigned char *roleptr = (unsigned char *) &role;
+
+	/*
+	 * If there is a previously-created Bloom filter, use it to determine
+	 * whether the role is missing from the list.  Otherwise, do an ordinary
+	 * linear search through the existing role list.
+	 */
+	if ((*bf && bloom_lacks_element(*bf, roleptr, sizeof(Oid))) ||
+		!list_member_oid(roles_list, role))
+	{
+		/*
+		 * If the list is large, we take on the overhead of creating and
+		 * populating a Bloom filter to speed up future calls to this
+		 * function.
+		 */
+		if (*bf == NULL &&
+			list_length(roles_list) > ROLES_LIST_BLOOM_THRESHOLD)
+		{
+			*bf = bloom_create(ROLES_LIST_BLOOM_THRESHOLD * 10, work_mem, 0);
+			foreach_oid(roleid, roles_list)
+				bloom_add_element(*bf, (unsigned char *) &roleid, sizeof(Oid));
+		}
+
+		/*
+		 * Finally, add the role to the list and the Bloom filter, if it
+		 * exists.
+		 */
+		roles_list = lappend_oid(roles_list, role);
+		if (*bf)
+			bloom_add_element(*bf, roleptr, sizeof(Oid));
+	}
+
+	return roles_list;
+}
+
 /*
  * Get a list of roles that the specified roleid is a member of
  *
@@ -4946,6 +4998,7 @@ roles_is_member_of(Oid roleid, enum RoleRecurseType type,
 	ListCell   *l;
 	List	   *new_cached_roles;
 	MemoryContext oldctx;
+	bloom_filter *bf = NULL;
 
 	Assert(OidIsValid(admin_of) == PointerIsValid(admin_role));
 	if (admin_role != NULL)
@@ -5023,16 +5076,22 @@ roles_is_member_of(Oid roleid, enum RoleRecurseType type,
 			 * graph, we must test for having already seen this role. It is
 			 * legal for instance to have both A->B and A->C->B.
 			 */
-			roles_list = list_append_unique_oid(roles_list, otherid);
+			roles_list = roles_list_append(roles_list, &bf, otherid);
 		}
 		ReleaseSysCacheList(memlist);
 
 		/* implement pg_database_owner implicit membership */
 		if (memberid == dba && OidIsValid(dba))
-			roles_list = list_append_unique_oid(roles_list,
-												ROLE_PG_DATABASE_OWNER);
+			roles_list = roles_list_append(roles_list, &bf,
+										   ROLE_PG_DATABASE_OWNER);
 	}
 
+	/*
+	 * Free the Bloom filter created by roles_list_append(), if there is one.
+	 */
+	if (bf)
+		bloom_free(bf);
+
 	/*
 	 * Copy the completed list into TopMemoryContext so it will persist.
 	 */
-- 
2.25.1

Reply via email to