On 01/28/2014 04:12 PM, Alexander Korotkov wrote:
>3. A binary heap would be a better data structure to buffer the rechecked
>values. A Red-Black tree allows random insertions and deletions, but in
>this case you need to insert arbitrary values but only remove the minimum
>item. That's exactly what a binary heap excels at. We have a nice binary
>heap implementation in the backend that you can use, see
>src/backend/lib/binaryheap.c.
>
Hmm. For me binary heap would be a better data structure for KNN-GiST at
all :-)

I decided to give this a shot, replacing the red-black tree in GiST with the binary heap we have in lib/binaryheap.c. It made the GiST code somewhat simpler, as the binaryheap interface is simpler than the red-black tree one. Unfortunately, performance was somewhat worse. That was quite surprising, as insertions and deletions are both O(log N) in both data structures, but the red-black tree implementation is more complicated.

I implemented another data structure called a Pairing Heap. It's also a fairly simple data structure, but insertions are O(1) instead of O(log N). It also performs fairly well in practice.

With that, I got a small but measurable improvement. To test, I created a table like this:

create table gisttest (id integer, p point);
insert into gisttest select id, point(random(), random()) from generate_series(1, 1000000) id;
create index i_gisttest on gisttest using gist (p);

And I ran this query with pgbench:

select id from gisttest order by p <-> '(0,0)' limit 1000;

With unpatched master, I got about 650 TPS, and with the patch 720 TPS. That's a nice little improvement, but perhaps more importantly, the pairing heap implementation consumes less memory. To measure that, I put a MemoryContextStats(so->queueCtx) call into gistendscan. With the above query, but without the "limit" clause, on master I got:

GiST scan context: 2109752 total in 10 blocks; 2088456 free (24998 chunks); 21296 used

And with the patch:

GiST scan context: 1061160 total in 9 blocks; 1040088 free (12502 chunks); 21072 used

That's 2MB vs 1MB. While that's not much in absolute terms, it'd be nice to reduce that memory consumption, as there is no hard upper bound on how much might be needed. If the GiST tree is really disorganized for some reason, a query might need a lot more.


So all in all, I quite like this patch, even though it doesn't do anything too phenomenal. It adds a some code, in the form of the new pairing heap implementation, but it makes the GiST code a little bit simpler. And it gives a small performance gain, and reduces memory usage a bit.

- Heikki

diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c
index 7a8692b..52b2c53 100644
--- a/src/backend/access/gist/gistget.c
+++ b/src/backend/access/gist/gistget.c
@@ -18,6 +18,7 @@
 #include "access/relscan.h"
 #include "miscadmin.h"
 #include "pgstat.h"
+#include "lib/pairingheap.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
@@ -243,8 +244,6 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances,
 	GISTPageOpaque opaque;
 	OffsetNumber maxoff;
 	OffsetNumber i;
-	GISTSearchTreeItem *tmpItem = so->tmpTreeItem;
-	bool		isNew;
 	MemoryContext oldcxt;
 
 	Assert(!GISTSearchItemIsHeap(*pageItem));
@@ -275,18 +274,15 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances,
 		oldcxt = MemoryContextSwitchTo(so->queueCxt);
 
 		/* Create new GISTSearchItem for the right sibling index page */
-		item = palloc(sizeof(GISTSearchItem));
-		item->next = NULL;
+		item = palloc(SizeOfGISTSearchItem(scan->numberOfOrderBys));
 		item->blkno = opaque->rightlink;
 		item->data.parentlsn = pageItem->data.parentlsn;
 
 		/* Insert it into the queue using same distances as for this page */
-		tmpItem->head = item;
-		tmpItem->lastHeap = NULL;
-		memcpy(tmpItem->distances, myDistances,
+		memcpy(item->distances, myDistances,
 			   sizeof(double) * scan->numberOfOrderBys);
 
-		(void) rb_insert(so->queue, (RBNode *) tmpItem, &isNew);
+		pairingheap_add(so->queue, &item->fbNode);
 
 		MemoryContextSwitchTo(oldcxt);
 	}
@@ -348,8 +344,7 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances,
 			oldcxt = MemoryContextSwitchTo(so->queueCxt);
 
 			/* Create new GISTSearchItem for this item */
-			item = palloc(sizeof(GISTSearchItem));
-			item->next = NULL;
+			item = palloc(SizeOfGISTSearchItem(scan->numberOfOrderBys));
 
 			if (GistPageIsLeaf(page))
 			{
@@ -372,12 +367,10 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances,
 			}
 
 			/* Insert it into the queue using new distance data */
-			tmpItem->head = item;
-			tmpItem->lastHeap = GISTSearchItemIsHeap(*item) ? item : NULL;
-			memcpy(tmpItem->distances, so->distances,
+			memcpy(item->distances, so->distances,
 				   sizeof(double) * scan->numberOfOrderBys);
 
-			(void) rb_insert(so->queue, (RBNode *) tmpItem, &isNew);
+			pairingheap_add(so->queue, &item->fbNode);
 
 			MemoryContextSwitchTo(oldcxt);
 		}
@@ -390,44 +383,24 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances,
  * Extract next item (in order) from search queue
  *
  * Returns a GISTSearchItem or NULL.  Caller must pfree item when done with it.
- *
- * NOTE: on successful return, so->curTreeItem is the GISTSearchTreeItem that
- * contained the result item.  Callers can use so->curTreeItem->distances as
- * the distances value for the item.
  */
 static GISTSearchItem *
 getNextGISTSearchItem(GISTScanOpaque so)
 {
-	for (;;)
-	{
-		GISTSearchItem *item;
-
-		/* Update curTreeItem if we don't have one */
-		if (so->curTreeItem == NULL)
-		{
-			so->curTreeItem = (GISTSearchTreeItem *) rb_leftmost(so->queue);
-			/* Done when tree is empty */
-			if (so->curTreeItem == NULL)
-				break;
-		}
+	GISTSearchItem *item;
 
-		item = so->curTreeItem->head;
-		if (item != NULL)
-		{
-			/* Delink item from chain */
-			so->curTreeItem->head = item->next;
-			if (item == so->curTreeItem->lastHeap)
-				so->curTreeItem->lastHeap = NULL;
-			/* Return item; caller is responsible to pfree it */
-			return item;
-		}
-
-		/* curTreeItem is exhausted, so remove it from rbtree */
-		rb_delete(so->queue, (RBNode *) so->curTreeItem);
-		so->curTreeItem = NULL;
+	if (!pairingheap_empty(so->queue))
+	{
+		item = (GISTSearchItem *) pairingheap_remove_first(so->queue);
+	}
+	else
+	{
+		/* Done when both heaps are empty */
+		item = NULL;
 	}
 
-	return NULL;
+	/* Return item; caller is responsible to pfree it */
+	return item;
 }
 
 /*
@@ -458,7 +431,7 @@ getNextNearest(IndexScanDesc scan)
 			/* visit an index page, extract its items into queue */
 			CHECK_FOR_INTERRUPTS();
 
-			gistScanPage(scan, item, so->curTreeItem->distances, NULL, NULL);
+			gistScanPage(scan, item, item->distances, NULL, NULL);
 		}
 
 		pfree(item);
@@ -491,7 +464,6 @@ gistgettuple(PG_FUNCTION_ARGS)
 		pgstat_count_index_scan(scan->indexRelation);
 
 		so->firstCall = false;
-		so->curTreeItem = NULL;
 		so->curPageData = so->nPageData = 0;
 
 		fakeItem.blkno = GIST_ROOT_BLKNO;
@@ -534,7 +506,7 @@ gistgettuple(PG_FUNCTION_ARGS)
 				 * this page, we fall out of the inner "do" and loop around to
 				 * return them.
 				 */
-				gistScanPage(scan, item, so->curTreeItem->distances, NULL, NULL);
+				gistScanPage(scan, item, item->distances, NULL, NULL);
 
 				pfree(item);
 			} while (so->nPageData == 0);
@@ -560,7 +532,6 @@ gistgetbitmap(PG_FUNCTION_ARGS)
 	pgstat_count_index_scan(scan->indexRelation);
 
 	/* Begin the scan by processing the root page */
-	so->curTreeItem = NULL;
 	so->curPageData = so->nPageData = 0;
 
 	fakeItem.blkno = GIST_ROOT_BLKNO;
@@ -580,7 +551,7 @@ gistgetbitmap(PG_FUNCTION_ARGS)
 
 		CHECK_FOR_INTERRUPTS();
 
-		gistScanPage(scan, item, so->curTreeItem->distances, tbm, &ntids);
+		gistScanPage(scan, item, item->distances, tbm, &ntids);
 
 		pfree(item);
 	}
diff --git a/src/backend/access/gist/gistscan.c b/src/backend/access/gist/gistscan.c
index 8360b16..52c39c5 100644
--- a/src/backend/access/gist/gistscan.c
+++ b/src/backend/access/gist/gistscan.c
@@ -22,14 +22,13 @@
 
 
 /*
- * RBTree support functions for the GISTSearchTreeItem queue
+ * Pairing heap comparison function for the GISTSearchItem queue
  */
-
 static int
-GISTSearchTreeItemComparator(const RBNode *a, const RBNode *b, void *arg)
+pairingheap_GISTSearchItem_cmp(const pairingheap_item *a, const pairingheap_item *b, void *arg)
 {
-	const GISTSearchTreeItem *sa = (const GISTSearchTreeItem *) a;
-	const GISTSearchTreeItem *sb = (const GISTSearchTreeItem *) b;
+	const GISTSearchItem *sa = (const GISTSearchItem *) a;
+	const GISTSearchItem *sb = (const GISTSearchItem *) b;
 	IndexScanDesc scan = (IndexScanDesc) arg;
 	int			i;
 
@@ -40,56 +39,13 @@ GISTSearchTreeItemComparator(const RBNode *a, const RBNode *b, void *arg)
 			return (sa->distances[i] > sb->distances[i]) ? 1 : -1;
 	}
 
-	return 0;
-}
-
-static void
-GISTSearchTreeItemCombiner(RBNode *existing, const RBNode *newrb, void *arg)
-{
-	GISTSearchTreeItem *scurrent = (GISTSearchTreeItem *) existing;
-	const GISTSearchTreeItem *snew = (const GISTSearchTreeItem *) newrb;
-	GISTSearchItem *newitem = snew->head;
-
-	/* snew should have just one item in its chain */
-	Assert(newitem && newitem->next == NULL);
-
-	/*
-	 * If new item is heap tuple, it goes to front of chain; otherwise insert
-	 * it before the first index-page item, so that index pages are visited in
-	 * LIFO order, ensuring depth-first search of index pages.  See comments
-	 * in gist_private.h.
-	 */
-	if (GISTSearchItemIsHeap(*newitem))
-	{
-		newitem->next = scurrent->head;
-		scurrent->head = newitem;
-		if (scurrent->lastHeap == NULL)
-			scurrent->lastHeap = newitem;
-	}
-	else if (scurrent->lastHeap == NULL)
-	{
-		newitem->next = scurrent->head;
-		scurrent->head = newitem;
-	}
-	else
-	{
-		newitem->next = scurrent->lastHeap->next;
-		scurrent->lastHeap->next = newitem;
-	}
-}
-
-static RBNode *
-GISTSearchTreeItemAllocator(void *arg)
-{
-	IndexScanDesc scan = (IndexScanDesc) arg;
-
-	return palloc(GSTIHDRSZ + sizeof(double) * scan->numberOfOrderBys);
-}
+	/* Heap items go before inner pages, to ensure a depth-first search */
+	if (GISTSearchItemIsHeap(*sa) && !GISTSearchItemIsHeap(*sb))
+		return -1;
+	if (!GISTSearchItemIsHeap(*sa) && GISTSearchItemIsHeap(*sb))
+		return 1;
 
-static void
-GISTSearchTreeItemDeleter(RBNode *rb, void *arg)
-{
-	pfree(rb);
+	return 0;
 }
 
 
@@ -127,7 +83,6 @@ gistbeginscan(PG_FUNCTION_ARGS)
 	so->queueCxt = giststate->scanCxt;	/* see gistrescan */
 
 	/* workspaces with size dependent on numberOfOrderBys: */
-	so->tmpTreeItem = palloc(GSTIHDRSZ + sizeof(double) * scan->numberOfOrderBys);
 	so->distances = palloc(sizeof(double) * scan->numberOfOrderBys);
 	so->qual_ok = true;			/* in case there are zero keys */
 
@@ -188,15 +143,9 @@ gistrescan(PG_FUNCTION_ARGS)
 
 	/* create new, empty RBTree for search queue */
 	oldCxt = MemoryContextSwitchTo(so->queueCxt);
-	so->queue = rb_create(GSTIHDRSZ + sizeof(double) * scan->numberOfOrderBys,
-						  GISTSearchTreeItemComparator,
-						  GISTSearchTreeItemCombiner,
-						  GISTSearchTreeItemAllocator,
-						  GISTSearchTreeItemDeleter,
-						  scan);
+	so->queue = pairingheap_allocate(pairingheap_GISTSearchItem_cmp, scan);
 	MemoryContextSwitchTo(oldCxt);
 
-	so->curTreeItem = NULL;
 	so->firstCall = true;
 
 	/* Update scan key, if a new one is given */
@@ -327,6 +276,7 @@ gistendscan(PG_FUNCTION_ARGS)
 	 * freeGISTstate is enough to clean up everything made by gistbeginscan,
 	 * as well as the queueCxt if there is a separate context for it.
 	 */
+	MemoryContextStats(so->queueCxt);
 	freeGISTstate(so->giststate);
 
 	PG_RETURN_VOID();
diff --git a/src/backend/lib/Makefile b/src/backend/lib/Makefile
index 327a1bc..b24ece6 100644
--- a/src/backend/lib/Makefile
+++ b/src/backend/lib/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/lib
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = ilist.o binaryheap.o stringinfo.o
+OBJS = ilist.o binaryheap.o pairingheap.o stringinfo.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/lib/pairingheap.c b/src/backend/lib/pairingheap.c
new file mode 100644
index 0000000..32eeba6
--- /dev/null
+++ b/src/backend/lib/pairingheap.c
@@ -0,0 +1,179 @@
+/*-------------------------------------------------------------------------
+ *
+ * pairingheap.c
+ *	  A Pairing Heap implementaion
+ *
+ * Portions Copyright (c) 2012-2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/lib/pairingheap.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <math.h>
+
+#include "lib/pairingheap.h"
+
+/*
+ * pairingheap_allocate
+ *
+ * Returns a pointer to a newly-allocated heap, with the heap property
+ * defined by the given comparator function, which will be invoked with the
+ * additional argument specified by 'arg'.
+ */
+pairingheap *
+pairingheap_allocate(pairingheap_comparator compare, void *arg)
+{
+	pairingheap *heap;
+
+	heap = (pairingheap *) palloc(sizeof(pairingheap));
+	heap->ph_compare = compare;
+	heap->ph_arg = arg;
+
+	heap->ph_root = NULL;
+
+	return heap;
+}
+
+/*
+ * pairingheap_free
+ *
+ * Releases memory used by the given pairingheap.
+ *
+ * Note: The items in the heap are not released!
+ */
+void
+pairingheap_free(pairingheap *heap)
+{
+	pfree(heap);
+}
+
+
+/* A helper function to merge two subheaps into one. */
+static pairingheap_item *
+merge(pairingheap *heap, pairingheap_item *a, pairingheap_item *b)
+{
+	if (a == NULL)
+		return b;
+	if (b == NULL)
+		return a;
+
+	/* Put the larger of the items as a child of the smaller one */
+	if (heap->ph_compare(a, b, heap->ph_arg) < 0)
+	{
+		b->next_sibling = a->first_child;
+		a->first_child = b;
+		return a;
+	}
+	else
+	{
+		a->next_sibling = b->first_child;
+		b->first_child = a;
+		return b;
+	}
+}
+
+/*
+ * pairingheap_add
+ *
+ * Adds the given datum to the heap in O(1) time.
+ */
+void
+pairingheap_add(pairingheap *heap, pairingheap_item *d)
+{
+	d->first_child = NULL;
+
+	/* Link the new item as a new tree */
+	heap->ph_root = merge(heap, heap->ph_root, d);
+}
+
+/*
+ * pairingheap_first
+ *
+ * Returns a pointer to the first (root, topmost) node in the heap
+ * without modifying the heap. The caller must ensure that this
+ * routine is not used on an empty heap. Always O(1).
+ */
+pairingheap_item *
+pairingheap_first(pairingheap *heap)
+{
+	Assert(!pairingheap_empty(heap));
+	return heap->ph_root;
+}
+
+/*
+ * pairingheap_remove_first
+ *
+ * Removes the first (root, topmost) node in the heap and returns a
+ * pointer to it after rebalancing the heap. The caller must ensure
+ * that this routine is not used on an empty heap. O(log n) amortized.
+ */
+pairingheap_item *
+pairingheap_remove_first(pairingheap *heap)
+{
+	pairingheap_item *result;
+	pairingheap_item *children;
+	pairingheap_item *item, *next;
+	pairingheap_item *l = NULL;
+	pairingheap_item *newroot;
+
+	Assert(!pairingheap_empty(heap));
+
+	/* Remove the smallest root. */
+	result = heap->ph_root;
+	children = result->first_child;
+
+	/*
+	 * In the trivial case that the heap became empty, or the root had only
+	 * a single child, we're done.
+	 */
+	if (children == NULL || children->next_sibling == NULL)
+	{
+		heap->ph_root = children;
+		return result;
+	}
+
+	/* Walk the remaining subheaps from left to right, merging in pairs */
+	next = children;
+	for (;;)
+	{
+		item = next;
+		if (item == NULL)
+			break;
+		if (item->next_sibling == NULL)
+		{
+			/* last odd item at the end of list */
+			item->next_sibling = l;
+			l = item;
+			break;
+		}
+		else
+		{
+			next = item->next_sibling->next_sibling;
+
+			item = merge(heap, item, item->next_sibling);
+			item->next_sibling = l;
+			l = item;
+		}
+	}
+
+	/*
+	 * Ok, 'l' now contains the pairs in reverse order. Now merge them into
+	 * a single heap.
+	 */
+	newroot = l;
+	next = l->next_sibling;
+	while (next)
+	{
+		item = next;
+		next = item->next_sibling;
+
+		newroot = merge(heap, newroot, item);
+	}
+	heap->ph_root = newroot;
+
+	return result;
+}
diff --git a/src/include/access/gist_private.h b/src/include/access/gist_private.h
index 2cbc918..7cf87bf 100644
--- a/src/include/access/gist_private.h
+++ b/src/include/access/gist_private.h
@@ -18,9 +18,9 @@
 #include "access/itup.h"
 #include "access/xlogreader.h"
 #include "fmgr.h"
+#include "lib/pairingheap.h"
 #include "storage/bufmgr.h"
 #include "storage/buffile.h"
-#include "utils/rbtree.h"
 #include "utils/hsearch.h"
 
 /*
@@ -123,7 +123,7 @@ typedef struct GISTSearchHeapItem
 /* Unvisited item, either index page or heap tuple */
 typedef struct GISTSearchItem
 {
-	struct GISTSearchItem *next;	/* list link */
+	pairingheap_item fbNode;
 	BlockNumber blkno;			/* index page number, or InvalidBlockNumber */
 	union
 	{
@@ -131,24 +131,12 @@ typedef struct GISTSearchItem
 		/* we must store parentlsn to detect whether a split occurred */
 		GISTSearchHeapItem heap;	/* heap info, if heap tuple */
 	}			data;
+	double		distances[1];	/* array with numberOfOrderBys entries */
 } GISTSearchItem;
 
 #define GISTSearchItemIsHeap(item)	((item).blkno == InvalidBlockNumber)
 
-/*
- * Within a GISTSearchTreeItem's chain, heap items always appear before
- * index-page items, since we want to visit heap items first.  lastHeap points
- * to the last heap item in the chain, or is NULL if there are none.
- */
-typedef struct GISTSearchTreeItem
-{
-	RBNode		rbnode;			/* this is an RBTree item */
-	GISTSearchItem *head;		/* first chain member */
-	GISTSearchItem *lastHeap;	/* last heap-tuple member, if any */
-	double		distances[1];	/* array with numberOfOrderBys entries */
-} GISTSearchTreeItem;
-
-#define GSTIHDRSZ offsetof(GISTSearchTreeItem, distances)
+#define SizeOfGISTSearchItem(n_distances) (offsetof(GISTSearchItem, distances) + sizeof(double) * (n_distances))
 
 /*
  * GISTScanOpaqueData: private state for a scan of a GiST index
@@ -156,15 +144,12 @@ typedef struct GISTSearchTreeItem
 typedef struct GISTScanOpaqueData
 {
 	GISTSTATE  *giststate;		/* index information, see above */
-	RBTree	   *queue;			/* queue of unvisited items */
+	pairingheap *queue;		/* queue of unvisited items */
 	MemoryContext queueCxt;		/* context holding the queue */
 	bool		qual_ok;		/* false if qual can never be satisfied */
 	bool		firstCall;		/* true until first gistgettuple call */
 
-	GISTSearchTreeItem *curTreeItem;	/* current queue item, if any */
-
 	/* pre-allocated workspace arrays */
-	GISTSearchTreeItem *tmpTreeItem;	/* workspace to pass to rb_insert */
 	double	   *distances;		/* output area for gistindex_keytest */
 
 	/* In a non-ordered search, returnable heap items are stored here: */
diff --git a/src/include/lib/pairingheap.h b/src/include/lib/pairingheap.h
new file mode 100644
index 0000000..3038401
--- /dev/null
+++ b/src/include/lib/pairingheap.h
@@ -0,0 +1,49 @@
+/*
+ * pairingheap.h
+ *
+ * A Pairing Heap implementation
+ *
+ * Portions Copyright (c) 2012-2014, PostgreSQL Global Development Group
+ *
+ * src/include/lib/pairingheap.h
+ */
+
+#ifndef PAIRINGHEAP_H
+#define PAIRINGHEAP_H
+
+/*
+ * This represents an element stored in the heap. You can embed this in
+ * a larger struct containing the actual data you're storing.
+ */
+typedef struct pairingheap_item
+{
+	struct pairingheap_item *first_child;
+	struct pairingheap_item *next_sibling;
+} pairingheap_item;
+
+/*
+ * For a max-heap, the comparator must return <0 iff a < b, 0 iff a == b,
+ * and >0 iff a > b.  For a min-heap, the conditions are reversed.
+ */
+typedef int (*pairingheap_comparator) (const pairingheap_item *a, const pairingheap_item *b, void *arg);
+
+/*
+ * A pairing heap.
+ */
+typedef struct pairingheap
+{
+	pairingheap_comparator ph_compare;	/* comparison function */
+	void	   *ph_arg;					/* opaque argument to ph_compare */
+	pairingheap_item *ph_root;			/* current root of the heap */
+} pairingheap;
+
+extern pairingheap *pairingheap_allocate(pairingheap_comparator compare,
+					void *arg);
+extern void pairingheap_free(pairingheap *heap);
+extern void pairingheap_add(pairingheap *heap, pairingheap_item *d);
+extern pairingheap_item *pairingheap_first(pairingheap *heap);
+extern pairingheap_item *pairingheap_remove_first(pairingheap *heap);
+
+#define pairingheap_empty(h)			((h)->ph_root == NULL)
+
+#endif   /* PAIRINGHEAP_H */
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to