Re: [HACKERS] [POC] A better way to expand hash indexes.

Jesper Pedersen Mon, 27 Mar 2017 11:27:04 -0700

Hi Mithun,

On 03/26/2017 01:56 AM, Mithun Cy wrote:

Thanks, Amit for the review.
On Sat, Mar 25, 2017 at 7:03 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:


I think one-dimensional patch has fewer places to touch, so that looks
better to me.  However, I think there is still hard coding and
assumptions in code which we should try to improve.


Great!, I will continue with spares 1-dimensional improvement.

I ran some performance scenarios on the patch to see if the increased'spares' allocation had an impact. I havn't found any regressions inthat regard.

Attached patch contains some small fixes, mainly to the documentation -on top of v7.


Best regards,
 Jesper

>From 5545e48ab7136f17b3d471e0ee679a6db6040865 Mon Sep 17 00:00:00 2001
From: jesperpedersen <jesper.peder...@redhat.com>
Date: Mon, 27 Mar 2017 14:15:00 -0400
Subject: [PATCH] Small fixes

---
 src/backend/access/hash/README     | 50 +++++++++++++++++++-------------------
 src/backend/access/hash/hashpage.c | 26 ++++++++++----------
 src/backend/access/hash/hashutil.c | 24 +++++++++---------
 3 files changed, 50 insertions(+), 50 deletions(-)

diff --git a/src/backend/access/hash/README b/src/backend/access/hash/README
index 6721ee1..ca46de7 100644
--- a/src/backend/access/hash/README
+++ b/src/backend/access/hash/README
@@ -58,50 +58,50 @@ rules to support a variable number of overflow pages while not having to
 move primary bucket pages around after they are created.
 
 Primary bucket pages (henceforth just "bucket pages") are allocated in
-power-of-2 groups, called "split points" in the code.  That means on every new
-split point we double the existing number of buckets.  And, it seems bad to
-allocate huge chunks of bucket pages all at once and we take ages to consume it.
-To avoid this exponential growth of index size, we did a trick to breakup
-allocation of buckets at splitpoint into 4 equal phases.  If 2^x is the total
-buckets need to be allocated at a splitpoint (from now on we shall call this
-as splitpoint group), then we allocate 1/4th (2^(x - 2)) of total buckets at
-each phase of splitpoint group. Next quarter of allocation will only happen if
+power-of-2 groups, called "split points" in the code.  That means at every new
+split point we double the existing number of buckets.  Allocating huge chucks
+of bucket pages all at once isn't optimal and we will take ages to consume those.
+To avoid this exponential growth of index size, we did use a trick to breakup
+allocation of buckets at the split points into 4 equal phases.  If 2^x is the total
+buckets needed to be allocated at a split point (from now on we shall call this
+a split point group), then we allocate 1/4th (2^(x - 2)) of total buckets at
+each phase of the split point group. Next quarter of allocation will only happen if
 buckets of previous phase has been already consumed.  Since for buckets number
-< 4 we cannot further divide it in to multiple phases, the first splitpoint
+< 4 we cannot further divide it in to multiple phases, the first split point
 group 0's allocation is done as follows {1, 1, 1, 1} = 4 buckets in total, the
 numbers in curly braces indicate number of buckets allocated within each phase
-of splitpoint group 0.  In next splitpoint group 1 the allocation phases will
-be as follow {1, 1, 1, 1} = 8 buckets in total.  And, for splitpoint group 2
+of split point group 0.  In next split point group 1 the allocation phases will
+be as follow {1, 1, 1, 1} = 8 buckets in total.  And, for split point group 2
 and 3 allocation phase will be {2, 2, 2, 2} = 16 buckets in total and
-{4, 4, 4, 4} = 32 buckets in total.  We can see that at each splitpoint group
-we double the total number of buckets from previous group but in a incremental
-phase.  The bucket pages allocated within one phase of a splitpoint group will
+{4, 4, 4, 4} = 32 buckets in total.  We can see that at each split point group
+we double the total number of buckets from previous group but in an incremental
+phase.  The bucket pages allocated within one phase of a split point group will
 appear consecutively in the index.  This addressing scheme allows the physical
 location of a bucket page to be computed from the bucket number relatively
 easily, using only a small amount of control information.  If we look at the
-function _hash_spareindex for a given bucket number we first compute splitpoint
-group it belongs to and then the phase with in it to which the bucket belongs
-to.  Adding them we get the global splitpoint phase number S to which the
+function _hash_spareindex for a given bucket number we first compute the split point
+group it belongs to and then the phase to which the bucket belongs
+to.  Adding them we get the global split point phase number S to which the
 bucket belongs and then simply add "hashm_spares[S] + 1" (where hashm_spares[] is
 an array stored in the metapage) with given bucket number to compute its
 physical address.  The hashm_spares[S] can be interpreted as the total number
 of overflow pages that have been allocated before the bucket pages of
-splitpoint phase S.  The hashm_spares[0] is always 0, so that buckets 0 and 1
-(which belong to splitpoint group 0's phase 1 and phase 2 respectively) always
+split point phase S.  The hashm_spares[0] is always 0, so that buckets 0 and 1
+(which belong to split point group 0's phase 1 and phase 2 respectively) always
 appear at block numbers 1 and 2, just after the meta page.  We always have
 hashm_spares[N] <= hashm_spares[N+1], since the latter count includes the
 former. The difference between the two represents the number of overflow
-pages appearing between the bucket page groups of splitpoints phase N and N+1.
+pages appearing between the bucket page groups of split points phase N and N+1.
 (Note: the above describes what happens when filling an initially minimally
 sized hash index.  In practice, we try to estimate the required index size
-and allocate a suitable number of splitpoints phases immediately, to avoid
+and allocate a suitable number of split point phases immediately, to avoid
 expensive re-splitting during initial index build.)
 
-When S splitpoints exist altogether, the array entries hashm_spares[0]
-through hashm_spares[S] are valid; hashm_spares[S] records the current
+When S split point exists, the array entries hashm_spares[0]
+through hashm_spares[S] are all valid; hashm_spares[S] records the current
 total number of overflow pages.  New overflow pages are created as needed
 at the end of the index, and recorded by incrementing hashm_spares[S].
-When it is time to create a new splitpoint phase's worth of bucket pages, we
+When it is time to create a new split point phase's worth of bucket pages, we
 copy hashm_spares[S] into hashm_spares[S+1] and increment S (which is
 stored in the hashm_ovflpoint field of the meta page).  This has the
 effect of reserving the correct number of bucket pages at the end of the
@@ -116,7 +116,7 @@ We have to allow the case "greater than" because it's possible that during
 an index extension we crash after allocating filesystem space and before
 updating the metapage.  Note that on filesystems that allow "holes" in
 files, it's entirely likely that pages before the logical EOF are not yet
-allocated: when we allocate a new splitpoint phase's worth of bucket pages, we
+allocated: when we allocate a new split point phase's worth of bucket pages, we
 physically zero the last such page to force the EOF up, and the first such
 page will be used immediately, but the intervening pages are not written
 until needed.
diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index 9b63414..fcb9711 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -507,9 +507,9 @@ _hash_init_metabuffer(Buffer buf, double num_tuples, RegProcedure procid,
 
 	/*
 	 * Choose the number of initial bucket pages to match the fill factor
-	 * given the estimated number of tuples.  We round up the result to total
-	 * the number of buckets which has to be allocated before using its
-	 * _hashm_spares index slot, however, and always force at least 2 bucket
+	 * given the estimated number of tuples.  We round up the result to the
+	 * total number of buckets which has to be allocated before using its
+	 * _hashm_spares index slot. However always force at least 2 bucket
 	 * pages. The upper limit is determined by considerations explained in
 	 * _hash_expandtable().
 	 */
@@ -567,14 +567,14 @@ _hash_init_metabuffer(Buffer buf, double num_tuples, RegProcedure procid,
 	 */
 	metap->hashm_maxbucket = num_buckets - 1;
 
-	/* set hishmask, which should be sufficient to cover num_buckets. */
+	/* set highmask, which should be sufficient to cover num_buckets. */
 	metap->hashm_highmask = (1 << (_hash_log2(num_buckets))) - 1;
 	metap->hashm_lowmask = (metap->hashm_highmask >> 1);
 
 	MemSet(metap->hashm_spares, 0, sizeof(metap->hashm_spares));
 	MemSet(metap->hashm_mapp, 0, sizeof(metap->hashm_mapp));
 
-	/* Set up mapping for one spare page after the initial splitpoints */
+	/* Set up mapping for one spare page after the initial split point */
 	metap->hashm_spares[spare_index] = 1;
 	metap->hashm_ovflpoint = spare_index;
 	metap->hashm_firstfree = 0;
@@ -770,7 +770,7 @@ restart_expand:
 	 * Note: it is safe to compute the new bucket's blkno here, even though we
 	 * may still need to update the BUCKET_TO_BLKNO mapping.  This is because
 	 * the current value of hashm_spares[hashm_ovflpoint] correctly shows
-	 * where we are going to put a new splitpoint's worth of buckets.
+	 * where we are going to put a new split point's worth of buckets.
 	 */
 	start_nblkno = BUCKET_TO_BLKNO(metap, new_bucket);
 
@@ -787,11 +787,11 @@ restart_expand:
 		Assert(spare_ndx == metap->hashm_ovflpoint + 1);
 
 		/*
-		 * The number of buckets in the new splitpoint group is equal to the
+		 * The number of buckets in the new split point group is equal to the
 		 * total number already in existence, i.e. new_bucket. But we do not
-		 * allocate them at once. Each splitpoint group will have 4 slots, we
+		 * allocate them at once. Each split point group will have 4 slots, we
 		 * distribute the buckets equally among them. So we allocate only one
-		 * fourth of total buckets in new splitpoint group at a time to consume
+		 * fourth of total buckets in new split point group at a time to consume
 		 * one phase after another. We treat allocation of buckets as a
 		 * separate WAL-logged action. Even if we fail after this operation,
 		 * won't leak bucket pages; rather, the next split will consume this
@@ -976,14 +976,14 @@ fail:
 
 
 /*
- * _hash_alloc_buckets -- allocate a new splitpoint's worth of bucket pages
+ * _hash_alloc_buckets -- allocate a new split point worth of bucket pages
  *
  * This does not need to initialize the new bucket pages; we'll do that as
  * each one is used by _hash_expandtable().  But we have to extend the logical
- * EOF to the end of the splitpoint; this keeps smgr's idea of the EOF in
+ * EOF to the end of the split point; this keeps smgr's idea of the EOF in
  * sync with ours, so that we don't get complaints from smgr.
  *
- * We do this by writing a page of zeroes at the end of the splitpoint range.
+ * We do this by writing a page of zeroes at the end of the split point range.
  * We expect that the filesystem will ensure that the intervening pages read
  * as zeroes too.  On many filesystems this "hole" will not be allocated
  * immediately, which means that the index file may end up more fragmented
@@ -993,7 +993,7 @@ fail:
  * XXX It's annoying that this code is executed with the metapage lock held.
  * We need to interlock against _hash_addovflpage() adding a new overflow page
  * concurrently, but it'd likely be better to use LockRelationForExtension
- * for the purpose.  OTOH, adding a splitpoint is a very infrequent operation,
+ * for the purpose.  OTOH, adding a split point is a very infrequent operation,
  * so it may not be worth worrying about.
  *
  * Returns TRUE if successful, or FALSE if allocation failed due to
diff --git a/src/backend/access/hash/hashutil.c b/src/backend/access/hash/hashutil.c
index 98975e1..4b89fa3 100644
--- a/src/backend/access/hash/hashutil.c
+++ b/src/backend/access/hash/hashutil.c
@@ -151,7 +151,7 @@ _hash_log2(uint32 num)
 
 #define SPLITPOINT_PHASES_PER_GRP 4
 #define SPLITPOINT_PHASE_MASK (SPLITPOINT_PHASES_PER_GRP - 1)
-#define Buckets_First_Split_Group 4
+#define BUCKETS_FIRST_SPLIT_GROUP 4
 
 #define TOTAL_SPLITPOINT_PHASES_BEFORE_GROUP(sp_g) (sp_g > 0 ? (sp_g << 2) : 0)
 
@@ -159,7 +159,7 @@ _hash_log2(uint32 num)
  * This is just a trick to save a division operation. If you look into the
  * bitmap of 0-based bucket_num 2nd and 3rd most significant bit will indicate
  * which phase of allocation the bucket_num belongs to with in the group. This
- * is because at every splitpoint group we allocate (2 ^ x) buckets and we have
+ * is because at every split point group we allocate (2 ^ x) buckets and we have
  * divided the allocation process into 4 equal phases. This macro returns value
  * from 0 to 3.
  */
@@ -168,14 +168,14 @@ _hash_log2(uint32 num)
 						(bucket_num))
 
 /*
- * At splitpoint group 0 we have 2 ^ (0 + 2) = 4 buckets, then at splitpoint
+ * At split point group 0 we have 2 ^ (0 + 2) = 4 buckets, then at split point
  * group 1 we have 2 ^ (1 + 2) = 8 total buckets. As the doubling continues at
- * splitpoint group "x" we will have 2 ^ (x + 2) total buckets. Total buckets
- * before x splitpoint group will be 2 ^ (x + 1). At each phase of allocation
- * within splitpoint group we add 2 ^ (x - 1) buckets, as we have to divide the
+ * split point group "x" we will have 2 ^ (x + 2) total buckets. Total buckets
+ * before x split point group will be 2 ^ (x + 1). At each phase of allocation
+ * within split point group we add 2 ^ (x - 1) buckets, as we have to divide the
  * task of allocation of 2 ^ (x + 1) buckets among 4 phases.
  *
- * Also, splitpoint group of a given bucket can be taken as
+ * Also, the split point group of a given bucket can be taken as
  * (_hash_log2(bucket) - 2). Subtracted by 2 because each group have
  * 2 ^ (x + 2) buckets.
  */
@@ -183,11 +183,11 @@ _hash_log2(uint32 num)
 #define BUCKETS_WITHIN_SP_GRP(sp_g, nphase) \
 						((nphase) * ((sp_g == 0) ? 1 : (1 << (sp_g - 1))))
 #define BUCKET_TO_SPLITPOINT_GRP(num_bucket) \
-		((num_bucket <= Buckets_First_Split_Group) ? 0 : \
+		((num_bucket <= BUCKETS_FIRST_SPLIT_GROUP) ? 0 : \
 												(_hash_log2(num_bucket) - 2))
 
 /*
- * _hash_spareindex -- returns spare index / global splitpoint phase of the
+ * _hash_spareindex -- returns spare index / global split point phase of the
  *					   bucket
  */
 uint32
@@ -204,7 +204,7 @@ _hash_spareindex(uint32 num_bucket)
 
 /*
  *	_hash_get_totalbuckets -- returns total number of buckets allocated till
- *							the given splitpoint phase.
+ *							the given split point phase.
  */
 uint32
 _hash_get_totalbuckets(uint32 splitpoint_phase)
@@ -218,8 +218,8 @@ _hash_get_totalbuckets(uint32 splitpoint_phase)
 	splitpoint_group = (splitpoint_phase / SPLITPOINT_PHASES_PER_GRP);
 
 	/*
-	 * total_buckets = total number of buckets before its splitpoint group +
-	 * total buckets within its splitpoint group until given splitpoint_phase.
+	 * total_buckets = total number of buckets before its split point group +
+	 * total buckets within its split point group until given splitpoint_phase.
 	 */
 	return BUCKETS_BEFORE_SP_GRP(splitpoint_group) +
 		BUCKETS_WITHIN_SP_GRP(splitpoint_group,
-- 
2.7.4

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

Reply via email to