[HACKERS] Macro customizable hashtable / bitmapscan & aggregation perf

Andres Freund Tue, 26 Jul 2016 17:44:18 -0700

Hi,

As previously mentioned, dynahash isn't particularly fast. In many cases
that doesn't particularly matter. But e.g. hash aggregation and bitmap
index scans are quite sensitive to hash performance.


The biggest problems with dynahash (also discussed at [1]) are

a) two level structure doubling the amount of indirect lookups
b) indirect function calls
c) using separate chaining based conflict resolution
d) being too general.

In the post referenced above I'd implemented an open-coded hashtable
addressing these issues for the tidbitmap.c case, with quite some
success.

It's not particularly desirable to have various slightly differing
hash-table implementations across the backend though. The only solution
I could think of, that preserves the efficiency, is to have a hash-table
implementation which is customizable to the appropriate types et, via
macros.

In the attached patch I've attached simplehash.h, which can be
customized by a bunch of macros, before being inlined.  There's also a
patch using this for tidbitmap.c and nodeAgg/nodeSubplan/... via
execGrouping.c.

To show the motivation:
Bitmapscan:
before:
tpch[22005][1]=# EXPLAIN ANALYZE SELECT SUM(l_extendedprice) FROM lineitem 
WHERE (l_shipdate >= '1995-01-01'::date) AND (l_shipdate <= '1996-12-31'::date);
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                    QUERY PLAN 
                                                                   │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Aggregate  (cost=942330.27..942330.28 rows=1 width=8) (actual 
time=5283.026..5283.026 rows=1 loops=1)                                         
   │
│   ->  Bitmap Heap Scan on lineitem  (cost=193670.02..919511.38 rows=9127557 
width=8) (actual time=3041.072..4436.569 rows=9113219 loops=1)       │
│         Recheck Cond: ((l_shipdate >= '1995-01-01'::date) AND (l_shipdate <= 
'1996-12-31'::date))                                                │
│         Heap Blocks: exact=585542                                             
                                                                   │
│         ->  Bitmap Index Scan on i_l_shipdate  (cost=0.00..191388.13 
rows=9127557 width=0) (actual time=2807.246..2807.246 rows=9113219 loops=1) │
│               Index Cond: ((l_shipdate >= '1995-01-01'::date) AND (l_shipdate 
<= '1996-12-31'::date))                                            │
│ Planning time: 0.077 ms                                                       
                                                                   │
│ Execution time: 5284.038 ms                                                   
                                                                   │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(8 rows)

after:
tpch[21953][1]=# EXPLAIN ANALYZE SELECT SUM(l_extendedprice) FROM lineitem 
WHERE (l_shipdate >= '1995-01-01'::date) AND (l_shipdate <= '1996-12-31'::date);
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                   QUERY PLAN  
                                                                 │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Aggregate  (cost=942330.27..942330.28 rows=1 width=8) (actual 
time=3431.602..3431.602 rows=1 loops=1)                                         
 │
│   ->  Bitmap Heap Scan on lineitem  (cost=193670.02..919511.38 rows=9127557 
width=8) (actual time=1158.783..2588.357 rows=9113219 loops=1)     │
│         Recheck Cond: ((l_shipdate >= '1995-01-01'::date) AND (l_shipdate <= 
'1996-12-31'::date))                                              │
│         Heap Blocks: exact=585542                                             
                                                                 │
│         ->  Bitmap Index Scan on i_l_shipdate  (cost=0.00..191388.13 
rows=9127557 width=0) (actual time=958.341..958.341 rows=9113219 loops=1) │
│               Index Cond: ((l_shipdate >= '1995-01-01'::date) AND (l_shipdate 
<= '1996-12-31'::date))                                          │
│ Planning time: 0.070 ms                                                       
                                                                 │
│ Execution time: 3435.259 ms                                                   
                                                                 │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(8 rows)


Hash-Agg:

before:

tpch[22005][1]=# EXPLAIN (ANALYZE, TIMING OFF) SELECT l_partkey, 
SUM(l_extendedprice) FROM lineitem GROUP BY 1 ORDER BY 2 DESC LIMIT 10;
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                      QUERY PLAN               
                                       │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Limit  (cost=1069653.05..1069653.08 rows=10 width=12) (actual rows=10 
loops=1)                                       │
│   ->  Sort  (cost=1069653.05..1072083.33 rows=972112 width=12) (actual 
rows=10 loops=1)                              │
│         Sort Key: (sum(l_extendedprice)) DESC                                 
                                       │
│         Sort Method: top-N heapsort  Memory: 25kB                             
                                       │
│         ->  HashAggregate  (cost=1038924.94..1048646.06 rows=972112 width=12) 
(actual rows=1000000 loops=1)          │
│               Group Key: l_partkey                                            
                                       │
│               ->  Seq Scan on lineitem  (cost=0.00..888925.96 rows=29999796 
width=12) (actual rows=29999795 loops=1) │
│ Planning time: 0.210 ms                                                       
                                       │
│ Execution time: 20887.906 ms                                                  
                                       │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(9 rows)

after:

tpch[22342][1]=# EXPLAIN (ANALYZE, TIMING OFF) SELECT l_partkey, 
SUM(l_extendedprice) FROM lineitem GROUP BY 1 ORDER BY 2 DESC LIMIT 10;
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                      QUERY PLAN               
                                       │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Limit  (cost=1069653.05..1069653.08 rows=10 width=12) (actual rows=10 
loops=1)                                       │
│   ->  Sort  (cost=1069653.05..1072083.33 rows=972112 width=12) (actual 
rows=10 loops=1)                              │
│         Sort Key: (sum(l_extendedprice)) DESC                                 
                                       │
│         Sort Method: top-N heapsort  Memory: 25kB                             
                                       │
│         ->  HashAggregate  (cost=1038924.94..1048646.06 rows=972112 width=12) 
(actual rows=1000000 loops=1)          │
│               Group Key: l_partkey                                            
                                       │
│               ->  Seq Scan on lineitem  (cost=0.00..888925.96 rows=29999796 
width=12) (actual rows=29999795 loops=1) │
│ Planning time: 0.104 ms                                                       
                                       │
│ Execution time: 14617.481 ms                                                  
                                       │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(9 rows)


The hash-table performance difference itself is bigger in both cases,
but other things, notably slot deforming and expression evaluation,
becomes the bottleneck.


Does anybody have a better idea how to approach the hash-table problem?
While it'd be nice not to resort to such macro-magic, I can't think of
an alternative short of switching to C++ and using templates.  Currently
this is used like:

#define SH_PREFIX pagetable
#define SH_KEYTYPE BlockNumber
#define SH_KEY blockno
#define SH_CONTAINS PagetableEntry
#define SH_HASH_KEY(tb, key) hash_blockno(key)
#define SH_EQUAL(tb, a, b) a == b
#define SH_SCOPE static
#define SH_DEFINE
#define SH_DECLARE
#include "lib/simplehash.h"

which then creates functions like pagetable_create/insert/lookup/delete/...


I strongly suspect there are some other cases that could benefit from
another hash-table implementation. Particularly catcache.c seems like a
candidate (instead of it's hand-rolled stuff, which frequently shows up
in profiles).


Note that these patches are *NOT* in a shape for in-detail review. I'd
first like to know what people think about the general approach, and
about better alternatives.

Also note that some queries in the regression test change their result,
because the ordering is unspecified...

Greetings,

Andres Freund

[1] 
http://archives.postgresql.org/message-id/20160714030616.dvlkboz6cw555ixb%40alap3.anarazel.de

>From 251972d79f0b64ec5d70900a8266d98b39322df9 Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Sun, 3 Jul 2016 15:05:18 -0700
Subject: [PATCH 1/4] WIP: Add likely/unlikely() macros.

---
 src/include/c.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/src/include/c.h b/src/include/c.h
index 4ab3f80..182066d 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -939,6 +939,15 @@ typedef NameData *Name;
 #endif
 
 
+#ifdef __GNUC__
+#define likely(x)	__builtin_expect(!!(x), 1)
+#define unlikely(x)	__builtin_expect(!!(x), 0)
+#else
+#define likely(x)	!!(x)
+#define unlikely(x)	!!(x)
+#endif
+
+
 /* ----------------------------------------------------------------
  *				Section 8:	random stuff
  * ----------------------------------------------------------------
-- 
2.8.1

>From 41b4173de858ce7776f8d5f166ed2e75d0de19fb Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Tue, 19 Jul 2016 14:10:28 -0700
Subject: [PATCH 2/4] WIP: Add an inline / macro customizable hashtable.

---
 src/include/lib/simplehash.h | 407 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 407 insertions(+)
 create mode 100644 src/include/lib/simplehash.h

diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
new file mode 100644
index 0000000..fefc2fc
--- /dev/null
+++ b/src/include/lib/simplehash.h
@@ -0,0 +1,407 @@
+/*
+ * simplehash.h
+ *	  Hash table implementation which will be specialized to user-defined types,
+ *	  by including this file to generate the required code. */
+
+/* helpers */
+#define SH_MAKE_PREFIX(a) CppConcat(a,_)
+#define SH_MAKE_NAME(name) SH_MAKE_NAME_(SH_MAKE_PREFIX(SH_PREFIX),name)
+#define SH_MAKE_NAME_(a,b) CppConcat(a,b)
+
+#ifdef SH_STORE_HASH
+#define SH_COMPARE_KEYS(tb, ahash, akey, b) (ahash == SH_GET_HASH(tb, b) && SH_EQUAL(tb, b->SH_KEY, akey))
+#else
+#define SH_COMPARE_KEYS(tb, ahash, akey, b) (SH_EQUAL(tb, akey, b->SH_KEY))
+#endif
+
+/* type declarations */
+#define SH_TYPE  SH_MAKE_NAME(hash)
+#define SH_STATUS  SH_MAKE_NAME(status)
+#define SH_STATUS_EMPTY  SH_MAKE_NAME(EMPTY)
+#define SH_STATUS_IN_USE  SH_MAKE_NAME(IN_USE)
+#define SH_STATUS_DELETED  SH_MAKE_NAME(DELETED)
+#define SH_ITERATOR SH_MAKE_NAME(iterator)
+
+/* function declarations */
+#define SH_CREATE SH_MAKE_NAME(create)
+#define SH_DESTROY SH_MAKE_NAME(destroy)
+#define SH_INSERT SH_MAKE_NAME(insert)
+#define SH_DELETE SH_MAKE_NAME(delete)
+#define SH_LOOKUP SH_MAKE_NAME(lookup)
+#define SH_RESIZE SH_MAKE_NAME(resize)
+#define SH_START_ITERATE SH_MAKE_NAME(start_iterate)
+#define SH_ITERATE SH_MAKE_NAME(iterate)
+
+/* generate forward declarations necessary to use the hash table */
+#ifdef SH_DECLARE
+
+/* type definitions */
+typedef struct SH_TYPE
+{
+	uint32 size;
+	uint32 members;
+	uint32 deleted;
+	bool resizing;
+	SH_CONTAINS *data;
+	MemoryContext ctx;
+	void *private;
+} SH_TYPE;
+
+typedef enum SH_STATUS
+{
+	SH_STATUS_EMPTY = 0x00,
+	SH_STATUS_IN_USE = 0x01,
+	SH_STATUS_DELETED = 0x02,
+} SH_STATUS;
+
+typedef uint32 SH_ITERATOR;
+
+/* function prototypes */
+SH_SCOPE SH_TYPE * SH_CREATE(MemoryContext ctx, uint32 size);
+SH_SCOPE SH_CONTAINS * SH_INSERT(SH_TYPE *tb, SH_KEYTYPE key, bool *found);
+SH_SCOPE void SH_DESTROY(SH_TYPE *tb);
+SH_SCOPE void SH_RESIZE(SH_TYPE *tb, uint32 newsize);
+SH_SCOPE SH_CONTAINS * SH_INSERT(SH_TYPE *tb, SH_KEYTYPE key, bool *found);
+SH_SCOPE SH_CONTAINS * SH_LOOKUP(SH_TYPE *tb, SH_KEYTYPE key);
+SH_SCOPE bool SH_DELETE(SH_TYPE *tb, SH_KEYTYPE key);
+SH_SCOPE void SH_START_ITERATE(SH_TYPE *tb, SH_ITERATOR *iter);
+SH_SCOPE SH_CONTAINS * SH_ITERATE(SH_TYPE *tb, SH_ITERATOR *iter);
+
+#endif /* SH_DECLARE */
+
+
+/* generate implementation of the hash table */
+#ifdef SH_DEFINE
+
+/* FIXME: can we move these to a central location? */
+/* calculate ceil(log base 2) of num */
+static inline int
+sh_log2(long num)
+{
+	int			i;
+	long		limit;
+
+	/* guard against too-large input, which would put us into infinite loop */
+	if (num > LONG_MAX / 2)
+		num = LONG_MAX / 2;
+
+	for (i = 0, limit = 1; limit < num; i++, limit <<= 1)
+		;
+	return i;
+}
+
+/* calculate first power of 2 >= num, bounded to what will fit in an int */
+static inline int
+sh_pow2_int(long num)
+{
+	if (num > INT_MAX / 2)
+		num = INT_MAX / 2;
+	return 1 << sh_log2(num);
+}
+
+/*
+ * create a hash table of size `size`, allocating required memory in the
+ * passed-in context.
+ */
+SH_SCOPE SH_TYPE *
+SH_CREATE(MemoryContext ctx, uint32 size)
+{
+	SH_TYPE *tb;
+
+	/* round up size to the next power of 2, eases lookups */
+	size = sh_pow2_int(size);
+
+	tb = MemoryContextAllocZero(ctx, sizeof(SH_TYPE));
+	tb->ctx = ctx;
+	tb->size = size;
+	tb->data = MemoryContextAllocExtended(tb->ctx, sizeof(SH_CONTAINS) * tb->size,
+										  MCXT_ALLOC_HUGE | MCXT_ALLOC_ZERO);
+
+	return tb;
+}
+
+/* destroy a previously created hash table */
+SH_SCOPE void
+SH_DESTROY(SH_TYPE *tb)
+{
+	pfree(tb->data);
+	pfree(tb);
+}
+
+/*
+ * Resize a hash table to at least `newsize`.
+ *
+ * Usually this will automatically be called by insertions/deletions, when
+ * necessary. But resizing to the exact input size can be advantageous
+ * performance-wise, when known at some point.
+ */
+SH_SCOPE void
+SH_RESIZE(SH_TYPE *tb, uint32 newsize)
+{
+	uint32 oldsize = tb->size;
+	SH_CONTAINS *olddata = tb->data;
+	int i;
+
+	Assert(!tb->resizing);
+	Assert(oldsize == sh_pow2_int(oldsize));
+
+	/* round up size to the next power of 2, eases lookups */
+	newsize = sh_pow2_int(newsize);
+
+	tb->resizing = true;
+	tb->size = newsize;
+	tb->data = MemoryContextAllocExtended(tb->ctx, sizeof(SH_CONTAINS) * tb->size,
+										  MCXT_ALLOC_HUGE | MCXT_ALLOC_ZERO);
+
+	ereport(DEBUG2,
+			(errmsg("resizing: %u to %u", oldsize, tb->size),
+			 errhidestmt(true)));
+
+	for (i = 0; i < oldsize; i++)
+	{
+		SH_CONTAINS *oldentry = &olddata[i];
+		if (oldentry->status == SH_STATUS_IN_USE)
+		{
+			SH_CONTAINS *newentry;
+			bool found;
+			newentry = SH_INSERT(tb, oldentry->SH_KEY, &found);
+			if (found)
+				elog(ERROR, "duplicate during resizing");
+
+			memcpy(newentry, oldentry, sizeof(SH_CONTAINS));
+		}
+	}
+
+	pfree(olddata);
+	tb->resizing = false;
+	tb->deleted = 0;
+}
+
+/*
+ * Insert the key key into the hash-table, set *found to true if the key
+ * already exists, false otherwise. Returns the hash-table entry in either
+ * case.
+ */
+SH_SCOPE SH_CONTAINS *
+SH_INSERT(SH_TYPE *tb, SH_KEYTYPE key, bool *found)
+{
+	uint32 hash = SH_HASH_KEY(tb, key);
+	uint32 startelem;
+	uint32 curelem;
+	SH_CONTAINS *data;
+
+	/*
+	 * Double size at 75% fill-factor. We do this check even if the key is
+	 * actually present, to avoid doing the check inside the loop. This also
+	 * lets us avoid having to re-find our position in the hashtable after
+	 * resizing.
+	 */
+	/* FIXME: compute next increase boundary when resizing? */
+	if (unlikely(!tb->resizing &&
+				 ((tb->members + 1) * 4 > tb->size * 3)))
+	{
+		SH_RESIZE(tb, tb->size * 2);
+	}
+
+
+	/* Similarly compact when there's 10% toombstones. */
+	if (unlikely(!tb->resizing && tb->deleted * 10 >= tb->size))
+	{
+		elog(LOG, "compacting");
+		SH_RESIZE(tb, tb->size);
+	}
+
+	startelem = hash & (tb->size - 1);
+	curelem = startelem;
+	data = tb->data;
+	while(true)
+	{
+		SH_CONTAINS *entry = &data[curelem];
+
+		if (entry->status == SH_STATUS_EMPTY)
+		{
+			tb->members++;
+			entry->SH_KEY = key;
+#ifdef SH_STORE_HASH
+			SH_GET_HASH(tb, entry) = hash;
+#endif
+			entry->status = SH_STATUS_IN_USE;
+			*found = false;
+			return entry;
+		}
+		else if (entry->status == SH_STATUS_IN_USE &&
+				 SH_COMPARE_KEYS(tb, hash, key, entry))
+		{
+			Assert(entry->status == SH_STATUS_IN_USE);
+			*found = true;
+			return entry;
+		}
+
+		/* deleted entry, unusable without checking later keys */
+
+		curelem++;
+
+		if (curelem == startelem)
+		{
+			/* XXX: shouldn't ever happen? */
+			elog(ERROR, "found no free hash-table entry");
+		}
+		else if (curelem >= tb->size)
+		{
+			curelem = 0;
+		}
+	}
+}
+
+
+/*
+ * Lookup up entry in hash table.  Returns NULL if key not present.
+ */
+SH_SCOPE SH_CONTAINS *
+SH_LOOKUP(SH_TYPE *tb, SH_KEYTYPE key)
+{
+	uint32 hash = SH_HASH_KEY(tb, key);
+	const uint32 startelem = hash & (tb->size - 1);
+	uint32 curelem = startelem;
+
+	while(true)
+	{
+		SH_CONTAINS *entry = &tb->data[curelem];
+
+		if (entry->status == SH_STATUS_EMPTY)
+		{
+			return NULL;
+		}
+
+		if (entry->status == SH_STATUS_IN_USE &&
+			SH_COMPARE_KEYS(tb, hash, key, entry))
+		{
+			return entry;
+		}
+
+		curelem++;
+
+		if (curelem == startelem)
+		{
+			/* XXX: shouldn't ever happen? */
+			elog(ERROR, "lookup wrapped");
+			return NULL;
+		}
+		else if (curelem >= tb->size)
+		{
+			curelem = 0;
+		}
+	}
+}
+
+/*
+ * Delete entry from hash table.  Returns whether key was present before
+ * deletion.
+ */
+SH_SCOPE bool
+SH_DELETE(SH_TYPE *tb, SH_KEYTYPE key)
+{
+	uint32 hash = SH_HASH_KEY(tb, key);
+	uint32 startelem = hash & (tb->size - 1);
+	uint32 curelem = startelem;
+
+	while(true)
+	{
+		SH_CONTAINS *entry = &tb->data[curelem];
+
+		if (entry->status == SH_STATUS_EMPTY)
+			return false;
+
+		if (entry->status == SH_STATUS_IN_USE &&
+			SH_COMPARE_KEYS(tb, hash, key, entry))
+		{
+			/*
+			 * XXX: Moving neighbouring entries would likely be the better
+			 * implementation.
+			 */
+			entry->status = SH_STATUS_DELETED;
+			tb->members--;
+			tb->deleted++;
+
+			return true;
+		}
+
+		curelem++;
+
+		if (curelem == startelem)
+		{
+			/* XXX: shouldn't ever happen? */
+			elog(ERROR, "delete wrapped");
+			return false;
+		}
+		else if (curelem >= tb->size)
+		{
+			curelem = 0;
+		}
+	}
+}
+
+/* Initialize iterator. */
+SH_SCOPE void
+SH_START_ITERATE(SH_TYPE *tb, SH_ITERATOR *iter)
+{
+	*iter = 0;
+}
+
+/*
+ * Iterate over all entries in the hash-table. Return the next occupied entry,
+ * or NULL if done.
+ */
+SH_SCOPE SH_CONTAINS *
+SH_ITERATE(SH_TYPE *tb, SH_ITERATOR *iter)
+{
+	while (true)
+	{
+		SH_CONTAINS *elem;
+		if (*iter >= tb->size)
+			return NULL;
+		elem = &tb->data[(*iter)++];
+		if (elem->status == SH_STATUS_IN_USE)
+		{
+			return elem;
+		}
+	}
+}
+
+#endif /* SH_DEFINE */
+
+
+/* undefine external paramters, so next hash table can be defined */
+#undef SH_PREFIX
+#undef SH_KEYTYPE
+#undef SH_KEY
+#undef SH_CONTAINS
+#undef SH_HASH_KEY
+#undef SH_SCOPE
+#undef SH_DECLARE
+#undef SH_DEFINE
+#undef SH_GET_HASH
+#undef SH_STORE_HASH
+
+/* undefine locally declared macros */
+#undef SH_MAKE_PREFIX
+#undef SH_MAKE_NAME
+#undef SH_MAKE_NAME_
+
+#undef SH_COMPARE_KEYS
+
+#undef SH_TYPE
+#undef SH_STATUS
+#undef SH_STATUS_EMPTY
+#undef SH_STATUS_IN_USE
+#undef SH_STATUS_DELETED
+#undef SH_ITERTOR
+
+#undef SH_CREATE
+#undef SH_DESTROY
+#undef SH_INSERT
+#undef SH_DELETE
+#undef SH_LOOKUP
+#undef SH_RESIZE
+#undef SH_START_ITERATE
+#undef SH_ITERATE
-- 
2.8.1

>From 93d157dd5936301c38152a0d8c31939a6d7102db Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Thu, 7 Jul 2016 14:47:19 -0700
Subject: [PATCH 3/4] WIP: Use more efficient hashtable for tidbitmap.c

---
 src/backend/nodes/tidbitmap.c | 109 +++++++++++++++++++++++-------------------
 1 file changed, 59 insertions(+), 50 deletions(-)

diff --git a/src/backend/nodes/tidbitmap.c b/src/backend/nodes/tidbitmap.c
index dfeb7d5..572674b 100644
--- a/src/backend/nodes/tidbitmap.c
+++ b/src/backend/nodes/tidbitmap.c
@@ -97,6 +97,7 @@
 typedef struct PagetableEntry
 {
 	BlockNumber blockno;		/* page number (hashtable key) */
+	char		status;
 	bool		ischunk;		/* T = lossy storage, F = exact */
 	bool		recheck;		/* should the tuples be rechecked? */
 	bitmapword	words[Max(WORDS_PER_PAGE, WORDS_PER_CHUNK)];
@@ -128,7 +129,7 @@ struct TIDBitmap
 	NodeTag		type;			/* to make it a valid Node */
 	MemoryContext mcxt;			/* memory context containing me */
 	TBMStatus	status;			/* see codes above */
-	HTAB	   *pagetable;		/* hash table of PagetableEntry's */
+	struct pagetable_hash *pagetable; /* hash table of PagetableEntry's */
 	int			nentries;		/* number of entries in pagetable */
 	int			maxentries;		/* limit on same to meet maxbytes */
 	int			npages;			/* number of exact entries in pagetable */
@@ -168,6 +169,29 @@ static void tbm_mark_page_lossy(TIDBitmap *tbm, BlockNumber pageno);
 static void tbm_lossify(TIDBitmap *tbm);
 static int	tbm_comparator(const void *left, const void *right);
 
+static inline uint32
+hash_blockno(BlockNumber b)
+{
+	uint32 h = b;
+	h ^= h >> 16;
+	h *= 0x85ebca6b;
+	h ^= h >> 13;
+	h *= 0xc2b2ae35;
+	h ^= h >> 16;
+	return h;
+}
+
+#define SH_PREFIX pagetable
+#define SH_KEYTYPE BlockNumber
+#define SH_KEY blockno
+#define SH_CONTAINS PagetableEntry
+#define SH_HASH_KEY(tb, key) hash_blockno(key)
+#define SH_EQUAL(tb, a, b) a == b
+#define SH_SCOPE static
+#define SH_DEFINE
+#define SH_DECLARE
+#include "lib/simplehash.h"
+
 
 /*
  * tbm_create - create an initially-empty bitmap
@@ -212,32 +236,24 @@ tbm_create(long maxbytes)
 static void
 tbm_create_pagetable(TIDBitmap *tbm)
 {
-	HASHCTL		hash_ctl;
-
 	Assert(tbm->status != TBM_HASH);
 	Assert(tbm->pagetable == NULL);
 
-	/* Create the hashtable proper */
-	MemSet(&hash_ctl, 0, sizeof(hash_ctl));
-	hash_ctl.keysize = sizeof(BlockNumber);
-	hash_ctl.entrysize = sizeof(PagetableEntry);
-	hash_ctl.hcxt = tbm->mcxt;
-	tbm->pagetable = hash_create("TIDBitmap",
-								 128,	/* start small and extend */
-								 &hash_ctl,
-								 HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+	tbm->pagetable = pagetable_create(tbm->mcxt, 128);
 
-	/* If entry1 is valid, push it into the hashtable */
 	if (tbm->status == TBM_ONE_PAGE)
 	{
 		PagetableEntry *page;
 		bool		found;
+		char		oldstatus;
 
-		page = (PagetableEntry *) hash_search(tbm->pagetable,
-											  (void *) &tbm->entry1.blockno,
-											  HASH_ENTER, &found);
+		page = pagetable_insert(tbm->pagetable,
+								tbm->entry1.blockno,
+								&found);
 		Assert(!found);
+		oldstatus = page->status;
 		memcpy(page, &tbm->entry1, sizeof(PagetableEntry));
+		page->status = oldstatus;
 	}
 
 	tbm->status = TBM_HASH;
@@ -250,7 +266,7 @@ void
 tbm_free(TIDBitmap *tbm)
 {
 	if (tbm->pagetable)
-		hash_destroy(tbm->pagetable);
+		pagetable_destroy(tbm->pagetable);
 	if (tbm->spages)
 		pfree(tbm->spages);
 	if (tbm->schunks)
@@ -357,12 +373,12 @@ tbm_union(TIDBitmap *a, const TIDBitmap *b)
 		tbm_union_page(a, &b->entry1);
 	else
 	{
-		HASH_SEQ_STATUS status;
+		pagetable_iterator i;
 		PagetableEntry *bpage;
 
 		Assert(b->status == TBM_HASH);
-		hash_seq_init(&status, b->pagetable);
-		while ((bpage = (PagetableEntry *) hash_seq_search(&status)) != NULL)
+		pagetable_start_iterate(b->pagetable, &i);
+		while ((bpage = pagetable_iterate(b->pagetable, &i)) != NULL)
 			tbm_union_page(a, bpage);
 	}
 }
@@ -449,12 +465,12 @@ tbm_intersect(TIDBitmap *a, const TIDBitmap *b)
 	}
 	else
 	{
-		HASH_SEQ_STATUS status;
+		pagetable_iterator i;
 		PagetableEntry *apage;
 
 		Assert(a->status == TBM_HASH);
-		hash_seq_init(&status, a->pagetable);
-		while ((apage = (PagetableEntry *) hash_seq_search(&status)) != NULL)
+		pagetable_start_iterate(a->pagetable, &i);
+		while ((apage = pagetable_iterate(a->pagetable, &i)) != NULL)
 		{
 			if (tbm_intersect_page(a, apage, b))
 			{
@@ -464,9 +480,7 @@ tbm_intersect(TIDBitmap *a, const TIDBitmap *b)
 				else
 					a->npages--;
 				a->nentries--;
-				if (hash_search(a->pagetable,
-								(void *) &apage->blockno,
-								HASH_REMOVE, NULL) == NULL)
+				if (!pagetable_delete(a->pagetable, apage->blockno))
 					elog(ERROR, "hash table corrupted");
 			}
 		}
@@ -606,7 +620,7 @@ tbm_begin_iterate(TIDBitmap *tbm)
 	 */
 	if (tbm->status == TBM_HASH && !tbm->iterating)
 	{
-		HASH_SEQ_STATUS status;
+		pagetable_iterator	i;
 		PagetableEntry *page;
 		int			npages;
 		int			nchunks;
@@ -620,9 +634,9 @@ tbm_begin_iterate(TIDBitmap *tbm)
 				MemoryContextAlloc(tbm->mcxt,
 								   tbm->nchunks * sizeof(PagetableEntry *));
 
-		hash_seq_init(&status, tbm->pagetable);
 		npages = nchunks = 0;
-		while ((page = (PagetableEntry *) hash_seq_search(&status)) != NULL)
+		pagetable_start_iterate(tbm->pagetable, &i);
+		while ((page = pagetable_iterate(tbm->pagetable, &i)) != NULL)
 		{
 			if (page->ischunk)
 				tbm->schunks[nchunks++] = page;
@@ -791,9 +805,7 @@ tbm_find_pageentry(const TIDBitmap *tbm, BlockNumber pageno)
 		return page;
 	}
 
-	page = (PagetableEntry *) hash_search(tbm->pagetable,
-										  (void *) &pageno,
-										  HASH_FIND, NULL);
+	page = pagetable_lookup(tbm->pagetable, pageno);
 	if (page == NULL)
 		return NULL;
 	if (page->ischunk)
@@ -833,16 +845,15 @@ tbm_get_pageentry(TIDBitmap *tbm, BlockNumber pageno)
 			tbm_create_pagetable(tbm);
 		}
 
-		/* Look up or create an entry */
-		page = (PagetableEntry *) hash_search(tbm->pagetable,
-											  (void *) &pageno,
-											  HASH_ENTER, &found);
+		page = pagetable_insert(tbm->pagetable, pageno, &found);
 	}
 
 	/* Initialize it if not present before */
 	if (!found)
 	{
+		char oldstatus = page->status;
 		MemSet(page, 0, sizeof(PagetableEntry));
+		page->status = oldstatus;
 		page->blockno = pageno;
 		/* must count it too */
 		tbm->nentries++;
@@ -869,9 +880,9 @@ tbm_page_is_lossy(const TIDBitmap *tbm, BlockNumber pageno)
 
 	bitno = pageno % PAGES_PER_CHUNK;
 	chunk_pageno = pageno - bitno;
-	page = (PagetableEntry *) hash_search(tbm->pagetable,
-										  (void *) &chunk_pageno,
-										  HASH_FIND, NULL);
+
+	page = pagetable_lookup(tbm->pagetable, chunk_pageno);
+
 	if (page != NULL && page->ischunk)
 	{
 		int			wordnum = WORDNUM(bitno);
@@ -912,9 +923,7 @@ tbm_mark_page_lossy(TIDBitmap *tbm, BlockNumber pageno)
 	 */
 	if (bitno != 0)
 	{
-		if (hash_search(tbm->pagetable,
-						(void *) &pageno,
-						HASH_REMOVE, NULL) != NULL)
+		if (pagetable_delete(tbm->pagetable, pageno))
 		{
 			/* It was present, so adjust counts */
 			tbm->nentries--;
@@ -922,15 +931,14 @@ tbm_mark_page_lossy(TIDBitmap *tbm, BlockNumber pageno)
 		}
 	}
 
-	/* Look up or create entry for chunk-header page */
-	page = (PagetableEntry *) hash_search(tbm->pagetable,
-										  (void *) &chunk_pageno,
-										  HASH_ENTER, &found);
+	page = pagetable_insert(tbm->pagetable, chunk_pageno, &found);
 
 	/* Initialize it if not present before */
 	if (!found)
 	{
+		char oldstatus = page->status;
 		MemSet(page, 0, sizeof(PagetableEntry));
+		page->status = oldstatus;
 		page->blockno = chunk_pageno;
 		page->ischunk = true;
 		/* must count it too */
@@ -939,8 +947,10 @@ tbm_mark_page_lossy(TIDBitmap *tbm, BlockNumber pageno)
 	}
 	else if (!page->ischunk)
 	{
+		char oldstatus = page->status;
 		/* chunk header page was formerly non-lossy, make it lossy */
 		MemSet(page, 0, sizeof(PagetableEntry));
+		page->status = oldstatus;
 		page->blockno = chunk_pageno;
 		page->ischunk = true;
 		/* we assume it had some tuple bit(s) set, so mark it lossy */
@@ -962,7 +972,7 @@ tbm_mark_page_lossy(TIDBitmap *tbm, BlockNumber pageno)
 static void
 tbm_lossify(TIDBitmap *tbm)
 {
-	HASH_SEQ_STATUS status;
+	pagetable_iterator i;
 	PagetableEntry *page;
 
 	/*
@@ -977,8 +987,8 @@ tbm_lossify(TIDBitmap *tbm)
 	Assert(!tbm->iterating);
 	Assert(tbm->status == TBM_HASH);
 
-	hash_seq_init(&status, tbm->pagetable);
-	while ((page = (PagetableEntry *) hash_seq_search(&status)) != NULL)
+	pagetable_start_iterate(tbm->pagetable, &i);
+	while ((page = pagetable_iterate(tbm->pagetable, &i)) != NULL)
 	{
 		if (page->ischunk)
 			continue;			/* already a chunk header */
@@ -996,7 +1006,6 @@ tbm_lossify(TIDBitmap *tbm)
 		if (tbm->nentries <= tbm->maxentries / 2)
 		{
 			/* we have done enough */
-			hash_seq_term(&status);
 			break;
 		}
 
-- 
2.8.1

>From 0716ad3e017196faa2746b169f7add9f92b02adf Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Thu, 21 Jul 2016 12:51:14 -0700
Subject: [PATCH 4/4] WIP: execGrouping.c simplehash

---
 src/backend/executor/execGrouping.c | 146 +++++++++++++++---------------------
 src/backend/executor/nodeAgg.c      |  33 ++++----
 src/backend/executor/nodeSetOp.c    |  24 +++---
 src/backend/executor/nodeSubplan.c  |   6 +-
 src/include/executor/executor.h     |  12 +++
 src/include/nodes/execnodes.h       |  38 ++++++----
 6 files changed, 124 insertions(+), 135 deletions(-)

diff --git a/src/backend/executor/execGrouping.c b/src/backend/executor/execGrouping.c
index 808275a..21089c8 100644
--- a/src/backend/executor/execGrouping.c
+++ b/src/backend/executor/execGrouping.c
@@ -22,13 +22,26 @@
 #include "miscadmin.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "access/htup_details.h"
 
+static uint32 TupleHashTableHash(struct tuplehash_hash *tb, const MinimalTuple tuple);
+static int TupleHashTableMatch(struct tuplehash_hash *tb, const MinimalTuple tuple1, const MinimalTuple tuple2);
 
-static TupleHashTable CurTupleHashTable = NULL;
-
-static uint32 TupleHashTableHash(const void *key, Size keysize);
-static int TupleHashTableMatch(const void *key1, const void *key2,
-					Size keysize);
+/*
+ * Define parameters for tuple hash table code generation. The interface is
+ * *also* defined in executor.h.
+ */
+#define SH_PREFIX tuplehash
+#define SH_KEYTYPE TupleHashEntryKey
+#define SH_KEY key
+#define SH_CONTAINS TupleHashEntryData
+#define SH_HASH_KEY(tb, key) TupleHashTableHash(tb, key.firstTuple)
+#define SH_EQUAL(tb, a, b) TupleHashTableMatch(tb, a.firstTuple, b.firstTuple) == 0
+#define SH_SCOPE extern
+#define SH_STORE_HASH
+#define SH_GET_HASH(tb, a) a->hash
+#define SH_DEFINE
+#include "lib/simplehash.h"
 
 
 /*****************************************************************************
@@ -279,10 +292,8 @@ BuildTupleHashTable(int numCols, AttrNumber *keyColIdx,
 					MemoryContext tablecxt, MemoryContext tempcxt)
 {
 	TupleHashTable hashtable;
-	HASHCTL		hash_ctl;
 
 	Assert(nbuckets > 0);
-	Assert(entrysize >= sizeof(TupleHashEntryData));
 
 	/* Limit initial table size request to not more than work_mem */
 	nbuckets = Min(nbuckets, (long) ((work_mem * 1024L) / entrysize));
@@ -297,20 +308,14 @@ BuildTupleHashTable(int numCols, AttrNumber *keyColIdx,
 	hashtable->tablecxt = tablecxt;
 	hashtable->tempcxt = tempcxt;
 	hashtable->entrysize = entrysize;
-	hashtable->tableslot = NULL;	/* will be made on first lookup */
+	hashtable->tableslot1 = NULL;	/* will be made on first lookup */
+	hashtable->tableslot2 = NULL;	/* will be made on first lookup */
 	hashtable->inputslot = NULL;
 	hashtable->in_hash_funcs = NULL;
 	hashtable->cur_eq_funcs = NULL;
 
-	MemSet(&hash_ctl, 0, sizeof(hash_ctl));
-	hash_ctl.keysize = sizeof(TupleHashEntryData);
-	hash_ctl.entrysize = entrysize;
-	hash_ctl.hash = TupleHashTableHash;
-	hash_ctl.match = TupleHashTableMatch;
-	hash_ctl.hcxt = tablecxt;
-	hashtable->hashtab = hash_create("TupleHashTable", nbuckets,
-									 &hash_ctl,
-					HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
+	hashtable->hashtab = tuplehash_create(tablecxt, nbuckets);
+	hashtable->hashtab->private = hashtable;
 
 	return hashtable;
 }
@@ -331,14 +336,13 @@ TupleHashEntry
 LookupTupleHashEntry(TupleHashTable hashtable, TupleTableSlot *slot,
 					 bool *isnew)
 {
-	TupleHashEntry entry;
+	TupleHashEntryData *entry;
 	MemoryContext oldContext;
-	TupleHashTable saveCurHT;
-	TupleHashEntryData dummy;
 	bool		found;
+	TupleHashEntryKey key;
 
 	/* If first time through, clone the input slot to make table slot */
-	if (hashtable->tableslot == NULL)
+	if (hashtable->tableslot1 == NULL)
 	{
 		TupleDesc	tupdesc;
 
@@ -349,7 +353,8 @@ LookupTupleHashEntry(TupleHashTable hashtable, TupleTableSlot *slot,
 		 * all input tuples will have equivalent descriptors.
 		 */
 		tupdesc = CreateTupleDescCopy(slot->tts_tupleDescriptor);
-		hashtable->tableslot = MakeSingleTupleTableSlot(tupdesc);
+		hashtable->tableslot1 = MakeSingleTupleTableSlot(tupdesc);
+		hashtable->tableslot2 = MakeSingleTupleTableSlot(tupdesc);
 		MemoryContextSwitchTo(oldContext);
 	}
 
@@ -358,26 +363,17 @@ LookupTupleHashEntry(TupleHashTable hashtable, TupleTableSlot *slot,
 
 	/*
 	 * Set up data needed by hash and match functions
-	 *
-	 * We save and restore CurTupleHashTable just in case someone manages to
-	 * invoke this code re-entrantly.
 	 */
 	hashtable->inputslot = slot;
 	hashtable->in_hash_funcs = hashtable->tab_hash_funcs;
 	hashtable->cur_eq_funcs = hashtable->tab_eq_funcs;
 
-	saveCurHT = CurTupleHashTable;
-	CurTupleHashTable = hashtable;
-
-	/* Search the hash table */
-	dummy.firstTuple = NULL;	/* flag to reference inputslot */
-	entry = (TupleHashEntry) hash_search(hashtable->hashtab,
-										 &dummy,
-										 isnew ? HASH_ENTER : HASH_FIND,
-										 &found);
+	key.firstTuple = NULL;
 
 	if (isnew)
 	{
+		entry = tuplehash_insert(hashtable->hashtab, key, &found);
+
 		if (found)
 		{
 			/* found pre-existing entry */
@@ -385,24 +381,19 @@ LookupTupleHashEntry(TupleHashTable hashtable, TupleTableSlot *slot,
 		}
 		else
 		{
-			/*
-			 * created new entry
-			 *
-			 * Zero any caller-requested space in the entry.  (This zaps the
-			 * "key data" dynahash.c copied into the new entry, but we don't
-			 * care since we're about to overwrite it anyway.)
-			 */
-			MemSet(entry, 0, hashtable->entrysize);
-
-			/* Copy the first tuple into the table context */
-			MemoryContextSwitchTo(hashtable->tablecxt);
-			entry->firstTuple = ExecCopySlotMinimalTuple(slot);
-
+			/* FIXME: fill */
 			*isnew = true;
+			entry->additional = NULL;
+			MemoryContextSwitchTo(hashtable->tablecxt);
+			/* Copy the first tuple into the table context */
+			entry->key.firstTuple = ExecCopySlotMinimalTuple(slot);
 		}
 	}
+	else
+	{
+		entry = tuplehash_lookup(hashtable->hashtab, key);
+	}
 
-	CurTupleHashTable = saveCurHT;
 
 	MemoryContextSwitchTo(oldContext);
 
@@ -425,8 +416,7 @@ FindTupleHashEntry(TupleHashTable hashtable, TupleTableSlot *slot,
 {
 	TupleHashEntry entry;
 	MemoryContext oldContext;
-	TupleHashTable saveCurHT;
-	TupleHashEntryData dummy;
+	TupleHashEntryKey key;
 
 	/* Need to run the hash functions in short-lived context */
 	oldContext = MemoryContextSwitchTo(hashtable->tempcxt);
@@ -441,20 +431,10 @@ FindTupleHashEntry(TupleHashTable hashtable, TupleTableSlot *slot,
 	hashtable->in_hash_funcs = hashfunctions;
 	hashtable->cur_eq_funcs = eqfunctions;
 
-	saveCurHT = CurTupleHashTable;
-	CurTupleHashTable = hashtable;
-
 	/* Search the hash table */
-	dummy.firstTuple = NULL;	/* flag to reference inputslot */
-	entry = (TupleHashEntry) hash_search(hashtable->hashtab,
-										 &dummy,
-										 HASH_FIND,
-										 NULL);
-
-	CurTupleHashTable = saveCurHT;
-
+	key.firstTuple = NULL;	/* flag to reference inputslot */
+	entry = tuplehash_lookup(hashtable->hashtab, key);
 	MemoryContextSwitchTo(oldContext);
-
 	return entry;
 }
 
@@ -474,16 +454,14 @@ FindTupleHashEntry(TupleHashTable hashtable, TupleTableSlot *slot,
  * Also, the caller must select an appropriate memory context for running
  * the hash functions. (dynahash.c doesn't change CurrentMemoryContext.)
  */
-static uint32
-TupleHashTableHash(const void *key, Size keysize)
+static uint32 TupleHashTableHash(struct tuplehash_hash *tb, const MinimalTuple tuple)
 {
-	MinimalTuple tuple = ((const TupleHashEntryData *) key)->firstTuple;
-	TupleTableSlot *slot;
-	TupleHashTable hashtable = CurTupleHashTable;
+	TupleHashTable hashtable = tb->private;
 	int			numCols = hashtable->numCols;
 	AttrNumber *keyColIdx = hashtable->keyColIdx;
-	FmgrInfo   *hashfunctions;
 	uint32		hashkey = 0;
+	TupleTableSlot *slot;
+	FmgrInfo   *hashfunctions;
 	int			i;
 
 	if (tuple == NULL)
@@ -495,8 +473,7 @@ TupleHashTableHash(const void *key, Size keysize)
 	else
 	{
 		/* Process a tuple already stored in the table */
-		/* (this case never actually occurs in current dynahash.c code) */
-		slot = hashtable->tableslot;
+		slot = hashtable->tableslot1;
 		ExecStoreMinimalTuple(tuple, slot, false);
 		hashfunctions = hashtable->tab_hash_funcs;
 	}
@@ -537,28 +514,25 @@ TupleHashTableHash(const void *key, Size keysize)
  * the compare functions.  (dynahash.c doesn't change CurrentMemoryContext.)
  */
 static int
-TupleHashTableMatch(const void *key1, const void *key2, Size keysize)
+TupleHashTableMatch(struct tuplehash_hash *tb, const MinimalTuple tuple1, const MinimalTuple tuple2)
 {
-	MinimalTuple tuple1 = ((const TupleHashEntryData *) key1)->firstTuple;
-
-#ifdef USE_ASSERT_CHECKING
-	MinimalTuple tuple2 = ((const TupleHashEntryData *) key2)->firstTuple;
-#endif
 	TupleTableSlot *slot1;
 	TupleTableSlot *slot2;
-	TupleHashTable hashtable = CurTupleHashTable;
+	TupleHashTable hashtable = tb->private;
 
-	/*
-	 * We assume that dynahash.c will only ever call us with the first
-	 * argument being an actual table entry, and the second argument being
-	 * LookupTupleHashEntry's dummy TupleHashEntryData.  The other direction
-	 * could be supported too, but is not currently used by dynahash.c.
-	 */
 	Assert(tuple1 != NULL);
-	slot1 = hashtable->tableslot;
+	slot1 = hashtable->tableslot1;
 	ExecStoreMinimalTuple(tuple1, slot1, false);
-	Assert(tuple2 == NULL);
-	slot2 = hashtable->inputslot;
+
+	if (tuple2 != NULL)
+	{
+		slot2 = hashtable->tableslot2;
+		ExecStoreMinimalTuple(tuple2, slot2, false);
+	}
+	else
+	{
+		slot2 = hashtable->inputslot;
+	}
 
 	/* For crosstype comparisons, the inputslot must be first */
 	if (execTuplesMatch(slot2,
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index b3187e6..31f25ae 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -440,14 +440,7 @@ typedef struct AggStatePerPhaseData
  * distinct set of GROUP BY column values.  We compute the hash key from
  * the GROUP BY columns.
  */
-typedef struct AggHashEntryData *AggHashEntry;
-
-typedef struct AggHashEntryData
-{
-	TupleHashEntryData shared;	/* common header for hash table entries */
-	/* per-aggregate transition status array */
-	AggStatePerGroupData pergroup[FLEXIBLE_ARRAY_MEMBER];
-}	AggHashEntryData;
+typedef TupleHashEntryData *AggHashEntry;
 
 static void initialize_phase(AggState *aggstate, int newphase);
 static TupleTableSlot *fetch_input_tuple(AggState *aggstate);
@@ -1658,8 +1651,7 @@ build_hash_table(AggState *aggstate)
 	Assert(node->aggstrategy == AGG_HASHED);
 	Assert(node->numGroups > 0);
 
-	entrysize = offsetof(AggHashEntryData, pergroup) +
-		aggstate->numaggs * sizeof(AggStatePerGroupData);
+	entrysize = sizeof(AggHashEntry);
 
 	aggstate->hashtable = BuildTupleHashTable(node->numCols,
 											  node->grpColIdx,
@@ -1730,7 +1722,7 @@ hash_agg_entry_size(int numAggs)
 	Size		entrysize;
 
 	/* This must match build_hash_table */
-	entrysize = offsetof(AggHashEntryData, pergroup) +
+	entrysize = sizeof(TupleHashEntryData) +
 		numAggs * sizeof(AggStatePerGroupData);
 	entrysize = MAXALIGN(entrysize);
 	/* Account for hashtable overhead (assuming fill factor = 1) */
@@ -1755,7 +1747,10 @@ lookup_hash_entry(AggState *aggstate, TupleTableSlot *inputslot)
 	/* if first time through, initialize hashslot by cloning input slot */
 	if (hashslot->tts_tupleDescriptor == NULL)
 	{
-		ExecSetSlotDescriptor(hashslot, inputslot->tts_tupleDescriptor);
+		MemoryContext oldContext = MemoryContextSwitchTo(hashslot->tts_mcxt);
+		/* get rid of constraints */
+		ExecSetSlotDescriptor(hashslot, CreateTupleDescCopy(inputslot->tts_tupleDescriptor));
+		MemoryContextSwitchTo(oldContext);
 		/* Make sure all unused columns are NULLs */
 		ExecStoreAllNullTuple(hashslot);
 	}
@@ -1777,8 +1772,10 @@ lookup_hash_entry(AggState *aggstate, TupleTableSlot *inputslot)
 
 	if (isnew)
 	{
+		entry->additional = MemoryContextAlloc(aggstate->hashtable->tablecxt,
+											   sizeof(AggStatePerGroupData) * aggstate->numtrans);
 		/* initialize aggregates for new tuple group */
-		initialize_aggregates(aggstate, entry->pergroup, 0);
+		initialize_aggregates(aggstate, entry->additional, 0);
 	}
 
 	return entry;
@@ -2203,9 +2200,9 @@ agg_fill_hash_table(AggState *aggstate)
 
 		/* Advance the aggregates */
 		if (DO_AGGSPLIT_COMBINE(aggstate->aggsplit))
-			combine_aggregates(aggstate, entry->pergroup);
+			combine_aggregates(aggstate, entry->additional);
 		else
-			advance_aggregates(aggstate, entry->pergroup);
+			advance_aggregates(aggstate, entry->additional);
 
 		/* Reset per-input-tuple context after each tuple */
 		ResetExprContext(tmpcontext);
@@ -2246,7 +2243,7 @@ agg_retrieve_hash_table(AggState *aggstate)
 		/*
 		 * Find the next entry in the hash table
 		 */
-		entry = (AggHashEntry) ScanTupleHashTable(&aggstate->hashiter);
+		entry = (AggHashEntry) ScanTupleHashTable(aggstate->hashtable, &aggstate->hashiter);
 		if (entry == NULL)
 		{
 			/* No more entries in hashtable, so done */
@@ -2267,11 +2264,11 @@ agg_retrieve_hash_table(AggState *aggstate)
 		 * Store the copied first input tuple in the tuple table slot reserved
 		 * for it, so that it can be used in ExecProject.
 		 */
-		ExecStoreMinimalTuple(entry->shared.firstTuple,
+		ExecStoreMinimalTuple(entry->key.firstTuple,
 							  firstSlot,
 							  false);
 
-		pergroup = entry->pergroup;
+		pergroup = entry->additional;
 
 		finalize_aggregates(aggstate, peragg, pergroup, 0);
 
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index 2d81d46..ce393e1 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -71,13 +71,9 @@ typedef struct SetOpStatePerGroupData
  * representative tuple and the duplicate counts for each distinct set
  * of grouping columns.  We compute the hash key from the grouping columns.
  */
-typedef struct SetOpHashEntryData *SetOpHashEntry;
+typedef TupleHashEntryData SetOpHashEntryData;
 
-typedef struct SetOpHashEntryData
-{
-	TupleHashEntryData shared;	/* common header for hash table entries */
-	SetOpStatePerGroupData pergroup;
-}	SetOpHashEntryData;
+typedef SetOpHashEntryData *SetOpHashEntry;
 
 
 static TupleTableSlot *setop_retrieve_direct(SetOpState *setopstate);
@@ -388,10 +384,14 @@ setop_fill_hash_table(SetOpState *setopstate)
 
 			/* If new tuple group, initialize counts */
 			if (isnew)
-				initialize_counts(&entry->pergroup);
+			{
+				entry->additional = MemoryContextAlloc(setopstate->hashtable->tablecxt,
+													   sizeof(SetOpStatePerGroupData));
+				initialize_counts(entry->additional);
+			}
 
 			/* Advance the counts */
-			advance_counts(&entry->pergroup, flag);
+			advance_counts(entry->additional, flag);
 		}
 		else
 		{
@@ -404,7 +404,7 @@ setop_fill_hash_table(SetOpState *setopstate)
 
 			/* Advance the counts if entry is already present */
 			if (entry)
-				advance_counts(&entry->pergroup, flag);
+				advance_counts(entry->additional, flag);
 		}
 
 		/* Must reset temp context after each hashtable lookup */
@@ -438,7 +438,7 @@ setop_retrieve_hash_table(SetOpState *setopstate)
 		/*
 		 * Find the next entry in the hash table
 		 */
-		entry = (SetOpHashEntry) ScanTupleHashTable(&setopstate->hashiter);
+		entry = (SetOpHashEntry) ScanTupleHashTable(setopstate->hashtable, &setopstate->hashiter);
 		if (entry == NULL)
 		{
 			/* No more entries in hashtable, so done */
@@ -450,12 +450,12 @@ setop_retrieve_hash_table(SetOpState *setopstate)
 		 * See if we should emit any copies of this tuple, and if so return
 		 * the first copy.
 		 */
-		set_output_count(setopstate, &entry->pergroup);
+		set_output_count(setopstate, entry->additional);
 
 		if (setopstate->numOutput > 0)
 		{
 			setopstate->numOutput--;
-			return ExecStoreMinimalTuple(entry->shared.firstTuple,
+			return ExecStoreMinimalTuple(entry->key.firstTuple,
 										 resultTupleSlot,
 										 false);
 		}
diff --git a/src/backend/executor/nodeSubplan.c b/src/backend/executor/nodeSubplan.c
index e503494..df9f04f 100644
--- a/src/backend/executor/nodeSubplan.c
+++ b/src/backend/executor/nodeSubplan.c
@@ -626,10 +626,10 @@ findPartialMatch(TupleHashTable hashtable, TupleTableSlot *slot,
 	TupleHashEntry entry;
 
 	InitTupleHashIterator(hashtable, &hashiter);
-	while ((entry = ScanTupleHashTable(&hashiter)) != NULL)
+	while ((entry = ScanTupleHashTable(hashtable, &hashiter)) != NULL)
 	{
-		ExecStoreMinimalTuple(entry->firstTuple, hashtable->tableslot, false);
-		if (!execTuplesUnequal(slot, hashtable->tableslot,
+		ExecStoreMinimalTuple(entry->key.firstTuple, hashtable->tableslot1, false);
+		if (!execTuplesUnequal(slot, hashtable->tableslot1,
 							   numCols, keyColIdx,
 							   eqfunctions,
 							   hashtable->tempcxt))
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 39521ed..0addcbf 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -95,6 +95,18 @@ extern PGDLLIMPORT ExecutorEnd_hook_type ExecutorEnd_hook;
 typedef bool (*ExecutorCheckPerms_hook_type) (List *, bool);
 extern PGDLLIMPORT ExecutorCheckPerms_hook_type ExecutorCheckPerms_hook;
 
+/*
+ * Define paramters necessary to generate the tuple hash table interface
+ * functions. It seems better to do this in executor.h, rather than
+ * execnodes.h, which is more widely included.
+ */
+#define SH_PREFIX tuplehash
+#define SH_KEYTYPE TupleHashEntryKey
+#define SH_KEY key
+#define SH_CONTAINS TupleHashEntryData
+#define SH_SCOPE extern
+#define SH_DECLARE
+#include "lib/simplehash.h"
 
 /*
  * prototypes from functions in execAmi.c
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index e7fd7bd..6fd9ff9 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -496,16 +496,26 @@ typedef struct ExecAuxRowMark
 typedef struct TupleHashEntryData *TupleHashEntry;
 typedef struct TupleHashTableData *TupleHashTable;
 
+typedef struct TupleHashEntryKey
+{
+	MinimalTuple firstTuple;	/* copy of first tuple in this group */
+} TupleHashEntryKey;
+
 typedef struct TupleHashEntryData
 {
-	/* firstTuple must be the first field in this struct! */
-	MinimalTuple firstTuple;	/* copy of first tuple in this group */
-	/* there may be additional data beyond the end of this struct */
-} TupleHashEntryData;			/* VARIABLE LENGTH STRUCT */
+	TupleHashEntryKey key;
+	void		*additional;
+	uint32		status;
+	uint32		hash;
+} TupleHashEntryData;
+
+/* forward declare, to avoid including hash related code here */
+struct tuplehash_hash;
+typedef uint32 TupleHashIterator;
 
 typedef struct TupleHashTableData
 {
-	HTAB	   *hashtab;		/* underlying dynahash table */
+	struct tuplehash_hash *hashtab;	/* underlying hash table */
 	int			numCols;		/* number of columns in lookup key */
 	AttrNumber *keyColIdx;		/* attr numbers of key columns */
 	FmgrInfo   *tab_hash_funcs; /* hash functions for table datatype(s) */
@@ -513,31 +523,27 @@ typedef struct TupleHashTableData
 	MemoryContext tablecxt;		/* memory context containing table */
 	MemoryContext tempcxt;		/* context for function evaluations */
 	Size		entrysize;		/* actual size to make each hash entry */
-	TupleTableSlot *tableslot;	/* slot for referencing table entries */
+	TupleTableSlot *tableslot1;	/* slot for referencing table entries */
+	TupleTableSlot *tableslot2;	/* slot for referencing table entries */
 	/* The following fields are set transiently for each table search: */
 	TupleTableSlot *inputslot;	/* current input tuple's slot */
 	FmgrInfo   *in_hash_funcs;	/* hash functions for input datatype(s) */
 	FmgrInfo   *cur_eq_funcs;	/* equality functions for input vs. table */
 }	TupleHashTableData;
 
-typedef HASH_SEQ_STATUS TupleHashIterator;
-
 /*
  * Use InitTupleHashIterator/TermTupleHashIterator for a read/write scan.
  * Use ResetTupleHashIterator if the table can be frozen (in this case no
  * explicit scan termination is needed).
  */
 #define InitTupleHashIterator(htable, iter) \
-	hash_seq_init(iter, (htable)->hashtab)
+	tuplehash_start_iterate(htable->hashtab, iter)
 #define TermTupleHashIterator(iter) \
-	hash_seq_term(iter)
+	(void)0
 #define ResetTupleHashIterator(htable, iter) \
-	do { \
-		hash_freeze((htable)->hashtab); \
-		hash_seq_init(iter, (htable)->hashtab); \
-	} while (0)
-#define ScanTupleHashTable(iter) \
-	((TupleHashEntry) hash_seq_search(iter))
+	InitTupleHashIterator(htable, iter)
+#define ScanTupleHashTable(htable, iter) \
+	tuplehash_iterate(htable->hashtab, iter)
 
 
 /* ----------------------------------------------------------------
-- 
2.8.1

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Macro customizable hashtable / bitmapscan & aggregation perf

Reply via email to