Re: Optimizing pg_trgm makesign() (was Re: [HACKERS] WIP: Fast GiST index build)

Tom Lane Fri, 01 Jul 2011 20:04:28 -0700

Heikki Linnakangas <heikki.linnakan...@enterprisedb.com> writes:
> I couldn't resist looking into this, and came up with the attached 
> patch. I tested this with:


> CREATE TABLE words (word text);
> COPY words FROM '/usr/share/dict/words';
> CREATE INDEX i_words ON words USING gist (word gist_trgm_ops);

> And then ran "REINDEX INDEX i_words" a few times with and without the 
> patch. Without the patch, reindex takes about 4.7 seconds. With the 
> patch, 3.7 seconds. That's a worthwhile gain on its own, but becomes 
> even more important with Alexander's fast GiST build patch, which calls 
> the penalty function more.

I tested this on HPPA (big-endian, picky about alignment) to verify that
you had that code path right.  It's right, but on that platform the
speedup in the "reindex dict/words" test is only about 6%.  I'm afraid
that the benefit is a lot more machine-specific than we could wish.

I tweaked the patch a bit further (don't pessimize the boundary case
where there's exactly 4n+1 trigrams, avoid forcing trg1 into memory,
improve the comment) and attach that version below.  This is a little
bit faster but I still wonder if it's worth the price of adding an
obscure dependency on the header size.

> I used the attached showsign-debug.patch to verify that the patched 
> makesign function produces the same results as the existing code. I 
> haven't tested the big-endian code, however.

That didn't look terribly convincing.  I added a direct validation that
old and new code give the same results, a la

        if (ISARRKEY(newval))
        {
                BITVEC          sign;
+               BITVEC          osign;
  
                makesign(sign, newval);
+               origmakesign(osign, newval);
+               Assert(memcmp(sign, osign, sizeof(BITVEC)) == 0);
  
                if (ISALLTRUE(origval))
                        *penalty = ((float) (SIGLENBIT - sizebitvec(sign))) / 
(float) (SIGLENBIT + 1);

and ran the regression tests and the dict/words example with that.

                        regards, tom lane

diff --git a/contrib/pg_trgm/trgm_gist.c b/contrib/pg_trgm/trgm_gist.c
index b328a09f41fee50beb96a28835e15ef835222cd6..7cea6d68fdcde71e0fb033682d77c21978c3406b 100644
*** a/contrib/pg_trgm/trgm_gist.c
--- b/contrib/pg_trgm/trgm_gist.c
*************** gtrgm_out(PG_FUNCTION_ARGS)
*** 84,100 ****
  static void
  makesign(BITVECP sign, TRGM *a)
  {
! 	int4		k,
! 				len = ARRNELEM(a);
  	trgm	   *ptr = GETARR(a);
! 	int4		tmp = 0;
  
  	MemSet((void *) sign, 0, sizeof(BITVEC));
  	SETBIT(sign, SIGLENBIT);	/* set last unused bit */
! 	for (k = 0; k < len; k++)
  	{
! 		CPTRGM(((char *) &tmp), ptr + k);
! 		HASH(sign, tmp);
  	}
  }
  
--- 84,178 ----
  static void
  makesign(BITVECP sign, TRGM *a)
  {
! 	int4		len = ARRNELEM(a);
  	trgm	   *ptr = GETARR(a);
! 	char	   *p;
! 	char	   *endptr;
! 	uint32		w1,
! 				w2,
! 				w3;
! 	uint32		trg0 = 0,
! 				trg1,
! 				trg2,
! 				trg3,
! 				trg4;
! 	uint32	   *p32;
  
  	MemSet((void *) sign, 0, sizeof(BITVEC));
  	SETBIT(sign, SIGLENBIT);	/* set last unused bit */
! 
! 	if (len <= 0)
! 		return;
! 
! 	/*----------
! 	 * We have to extract each trigram into a uint32, and calculate the HASH.
! 	 * This would be a lot easier if the trigrams were aligned on 4-byte
! 	 * boundaries, but they're not.  The simple way would be to copy each
! 	 * trigram byte-by-byte, but that is quite slow, and this function is a
! 	 * hotspot in penalty calculations.
! 	 *
! 	 * The first trigram in the array doesn't begin at a 4-byte boundary, as
! 	 * the flags byte comes first; but the next one does.  So we fetch the
! 	 * first trigram as a special case, and after that each four trigrams fall
! 	 * onto 4-byte words like this:
! 	 *
! 	 *  w1   w2   w3
! 	 * AAAB BBCC CDDD
! 	 *
! 	 * As long as there's at least four trigrams left to process, we fetch
! 	 * the next three words and extract the trigrams from them with bit
! 	 * operations, per the above diagram.  The last few trigrams are handled
! 	 * one at a time with byte-by-byte fetching.
! 	 *
! 	 * Note that this code yields different results on big-endian and
! 	 * little-endian machines, because the bytes of each trigram are loaded
! 	 * into a uint32 in memory order and left-justified.  That's probably
! 	 * undesirable, but changing this behavior would break existing indexes.
! 	 *----------
! 	 */
! 	endptr = (char *) (ptr + len);
! 	p32 = (uint32 *) (((char *) ptr) - 1);
! 
! 	/* Fetch and extract the initial word */
! 	w1 = *(p32++);
! #ifdef WORDS_BIGENDIAN
! 	trg1 = w1 << 8;
! #else
! 	trg1 = w1 >> 8;
! #endif
! 	HASH(sign, trg1);
! 
! 	while ((char *) p32 <= endptr - 3 * sizeof(uint32))
  	{
! 		w1 = *(p32++);
! 		w2 = *(p32++);
! 		w3 = *(p32++);
! 
! #ifdef WORDS_BIGENDIAN
! 		trg1 = w1 & 0xFFFFFF00;
! 		trg2 = (w1 << 24) | ((w2 & 0xFFFF0000) >> 8);
! 		trg3 = ((w2 & 0x0000FFFF) << 16) | ((w3 & 0xFF000000) >> 16);
! 		trg4 = w3 << 8;
! #else
! 		trg1 = w1 & 0x00FFFFFF;
! 		trg2 = (w1 >> 24) | ((w2 & 0x0000FFFF) << 8);
! 		trg3 = ((w2 & 0xFFFF0000) >> 16) | ((w3 & 0x000000FF) << 16);
! 		trg4 = w3 >> 8;
! #endif
! 
! 		HASH(sign, trg1);
! 		HASH(sign, trg2);
! 		HASH(sign, trg3);
! 		HASH(sign, trg4);
! 	}
! 
! 	/* Handle the remaining 0-3 trigrams the slow way */
! 	p = (char *) p32;
! 	while (p < endptr)
! 	{
! 		CPTRGM(((char *) &trg0), p);
! 		HASH(sign, trg0);
! 		p += 3;
  	}
  }

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Optimizing pg_trgm makesign() (was Re: [HACKERS] WIP: Fast GiST index build)

Reply via email to