Hi,

attached is a patch to address this problem, and the one reported by
Dilip.  I ran a lot of TPC-H and other benchmarks, and so far this
addresses all the performance issues, often being noticeably faster than
with the dynahash code.

Comments?


On 2017-03-03 11:23:00 +0530, Kuntal Ghosh wrote:
> On Fri, Mar 3, 2017 at 8:41 AM, Robert Haas <robertmh...@gmail.com> wrote:
> > On Fri, Mar 3, 2017 at 1:22 AM, Andres Freund <and...@anarazel.de> wrote:
> >> the resulting hash-values aren't actually meaningfully influenced by the
> >> IV. Because we just xor with the IV, most hash-value that without the IV
> >> would have fallen into a single hash-bucket, fall into a single
> >> hash-bucket afterwards as well; just somewhere else in the hash-range.
> >
> > Wow, OK.  I had kind of assumed (without looking) that setting the
> > hash IV did something a little more useful than that.  Maybe we should
> > do something like struct blah { int iv; int hv; }; newhv =
> > hash_any(&blah, sizeof(blah)).

The hash invocations are already noticeable performancewise, so I'm a
bit hesitant to go there.  I'd rather introduce a decent 'hash_combine'
function or such.


> Sounds good. I've seen a post from Thomas Munro suggesting some
> alternative approach for combining hash values in execGrouping.c[1].

Yea, I played with that too - it's a bit better, but doesn't really
address the problem fully either.  Seems to generally reduce collisions
in multi-column keys a bit, but not that much.


> >> In addition to that it seems quite worthwhile to provide an iterator
> >> that's not vulnerable to this.  An approach that I am, seemingly
> >> successfully, testing is to iterate the hashtable in multiple (in my
> >> case 23, because why not) passes, accessing only every nth element. That
> >> allows the data to be inserted in a lot less "dense" fashion.  But
> >> that's more an optimization, so I'll just push something like the patch
> >> mentioned in the thread already.
> >>
> >> Makes some sense?
> >
> > Yep.
> >
> Yes, it makes sense. Quadratic probing is another approach, but it
> would require an extra shift op every time we want to find the next or
> prev location during a collision.

That's not really the big problem with quadratic probing.  The issue is
that it leads to further random unpredictable memory accesses in case of
collisions, whereas linear probing has easy to predict accesses.

Greetings,

Andres Freund
>From 832b12ec232b270df86a50207b6ae6e6108e21f4 Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Wed, 23 Nov 2016 00:23:42 -0800
Subject: [PATCH] Make simplehash.h grow hashtable in additional cases.

Increase the size when either the distance between actual an optimal
slot grows too large, or when too many subsequent entries would have
to be moved.

This addresses reports that the simplehash performed, sometimes
considerably, worse than dynahash.

The reason turned out to be that insertions into the hashtable where,
due to the use of parallel query, in effect done from another
hashtable, in hash-value order.  If the target hashtable, due to
mis-estimation, was sized a lot smaller than the source table(s) that
lead to very imbalanced tables; a lot of entries in many close-by
buckets from the source tables were inserted into a single, wider,
bucket on the target table.  As the growth factor was solely computed
based on the fillfactor, the performance of the table decreased
further and further.

b81b5a96f424531b was an attempt to address this problem for hash
aggregates (but not for bitmap scans), but it turns out that the
current method of mixing hash values often actually leaves neighboring
hash-values close to each other, just in different value range.  It
might be worth revisiting that independently of this.

To address that problem resize tables in two additional cases: Firstly
when the optimal position for an entry would be far from the actual
position, secondly when many entries would have to be moved to make
space for the new entry (while satisfying the robin hood property).

In all my tests the new code now, even with parallelism, performs at
least as good as the old code, in several scenarios significantly
better.

Reported-By: Dilip Kumar, Robert Haas, Kuntal Ghosh
Discussion:
    https://postgr.es/m/CAFiTN-vagvuAydKG9VnWcoK=adahxmoa4ztrmnsvibbootn...@mail.gmail.com
    https://postgr.es/m/CAGz5QC+=fntygzmltbunekt6uawzfxjbkb5+7owm-n9dwvx...@mail.gmail.com
---
 src/include/lib/simplehash.h | 66 ++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 61 insertions(+), 5 deletions(-)

diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
index 74f768249a..1ffd0b04ee 100644
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@@ -157,13 +157,24 @@ SH_SCOPE void SH_STAT(SH_TYPE *tb);
 
 #include "utils/memutils.h"
 
-/* conservative fillfactor for a robin hood table, might want to adjust */
-#define SH_FILLFACTOR (0.8)
-/* increase fillfactor if we otherwise would error out */
-#define SH_MAX_FILLFACTOR (0.95)
 /* max data array size,we allow up to PG_UINT32_MAX buckets, including 0 */
 #define SH_MAX_SIZE (((uint64) PG_UINT32_MAX) + 1)
 
+/* normal fillfactor, unless already close to maximum */
+#ifndef SH_FILLFACTOR
+#define SH_FILLFACTOR (0.9)
+#endif
+/* increase fillfactor if we otherwise would error out */
+#define SH_MAX_FILLFACTOR (0.98)
+/* grow if actual and optimal location bigger than */
+#ifndef SH_GROW_MAX_DIB
+#define SH_GROW_MAX_DIB 25
+#endif
+/* grow if more than elements to move when inserting */
+#ifndef SH_GROW_MAX_MOVE
+#define SH_GROW_MAX_MOVE 150
+#endif
+
 #ifdef SH_STORE_HASH
 #define SH_COMPARE_KEYS(tb, ahash, akey, b) (ahash == SH_GET_HASH(tb, b) && SH_EQUAL(tb, b->SH_KEY, akey))
 #else
@@ -466,12 +477,18 @@ SH_INSERT(SH_TYPE *tb, SH_KEY_TYPE key, bool *found)
 	uint32		startelem;
 	uint32		curelem;
 	SH_ELEMENT_TYPE *data;
-	uint32		insertdist = 0;
+	uint32		insertdist;
+
+restart:
+	insertdist = 0;
 
 	/*
 	 * We do the grow check even if the key is actually present, to avoid
 	 * doing the check inside the loop. This also lets us avoid having to
 	 * re-find our position in the hashtable after resizing.
+	 *
+	 * Note that this also reached when resizing the table due to
+	 * SH_GROW_MAX_DIB / SH_GROW_MAX_MOVE.
 	 */
 	if (unlikely(tb->members >= tb->grow_threshold))
 	{
@@ -536,6 +553,7 @@ SH_INSERT(SH_TYPE *tb, SH_KEY_TYPE key, bool *found)
 			SH_ELEMENT_TYPE *lastentry = entry;
 			uint32		emptyelem = curelem;
 			uint32		moveelem;
+			int32		emptydist = 0;
 
 			/* find next empty bucket */
 			while (true)
@@ -550,6 +568,25 @@ SH_INSERT(SH_TYPE *tb, SH_KEY_TYPE key, bool *found)
 					lastentry = emptyentry;
 					break;
 				}
+
+				/*
+				 * To avoid negative consequences from overly imbalanced
+				 * hashtables, grow the hashtable if collisions would require
+				 * us to move a lot of entries.  The most likely cause of such
+				 * imbalance is filling a (currently) small table, from a
+				 * currently big one, in hash-table order.
+				 */
+				if (++emptydist > SH_GROW_MAX_MOVE)
+				{
+					ereport(DEBUG1,
+							(errmsg("increasing hashtable size due to max move distance"),
+							 errdetail("size %f/members %f: factor %.2f",
+									   (double)tb->size, (double)tb->members,
+									   (double) tb->members / tb->size),
+							 errhidestmt(true)));
+					tb->grow_threshold = 0;
+					goto restart;
+				}
 			}
 
 			/* shift forward, starting at last occupied element */
@@ -580,11 +617,30 @@ SH_INSERT(SH_TYPE *tb, SH_KEY_TYPE key, bool *found)
 #endif
 			entry->status = SH_STATUS_IN_USE;
 			*found = false;
+
 			return entry;
 		}
 
 		curelem = SH_NEXT(tb, curelem, startelem);
 		insertdist++;
+
+		/*
+		 * To avoid negative consequences from overly imbalanced hashtables,
+		 * grow the hashtable if collisions lead to large runs. The most
+		 * likely cause of such imbalance is filling a (currently) small
+		 * table, from a currently big one, in hash-table order.
+		 */
+		if (insertdist > SH_GROW_MAX_DIB)
+		{
+			ereport(DEBUG1,
+					(errmsg("increasing hashtable size due to max insertion distance"),
+					 errdetail("size %f/members %f: factor %.2f",
+							   (double)tb->size, (double)tb->members,
+							   (double) tb->members / tb->size),
+					 errhidestmt(true)));
+			tb->grow_threshold = 0;
+			goto restart;
+		}
 	}
 }
 
-- 
2.11.0.22.g8d7a455.dirty

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to