new thread [was: WIP Patch: Add a function that returns binary JSONB as a bytea]

> I wrote:
> > We can also shave a
> > few percent by having pg_utf8_verifystr use SSE2 for the ascii path. I
> > can look into this.
>
> Here's a patch for that. If the input is mostly ascii, I'd expect that
> part of the flame graph to shrink by 40-50% and give a small boost
> overall.

Here is an updated patch using the new USE_SSE2 symbol. The style is
different from the last one in that each stanza has platform-specific
code. I wanted to try it this way because is_valid_ascii() is already
written in SIMD-ish style using general purpose registers and bit
twiddling, so it seemed natural to see the two side-by-side. Sometimes
they can share the same comment. If we think this is bad for
readability, I can go back to one block each, but that way leads to
duplication of code and it's difficult to see what's different for
each platform, IMO.

--
John Naylor
EDB: http://www.enterprisedb.com
From 69d56a21192ed2f03bc08f078cfff7ba5cb0d80b Mon Sep 17 00:00:00 2001
From: John Naylor <john.naylor@postgresql.org>
Date: Fri, 5 Aug 2022 13:30:47 +0700
Subject: [PATCH v2] Use SSE2 in is_valid_ascii where available.

This is ~40% faster than portable C on contemporary Intel hardware.
---
 src/common/wchar.c                       | 18 +++------
 src/include/mb/pg_wchar.h                | 50 ++++++++++++++++++++++--
 src/test/regress/expected/conversion.out |  3 +-
 src/test/regress/sql/conversion.sql      |  3 +-
 4 files changed, 57 insertions(+), 17 deletions(-)

diff --git a/src/common/wchar.c b/src/common/wchar.c
index 1e6e198bf2..a305e0e66b 100644
--- a/src/common/wchar.c
+++ b/src/common/wchar.c
@@ -1918,26 +1918,20 @@ pg_utf8_verifystr(const unsigned char *s, int len)
 	const int	orig_len = len;
 	uint32		state = BGN;
 
-/*
- * Sixteen seems to give the best balance of performance across different
- * byte distributions.
- */
-#define STRIDE_LENGTH 16
-
-	if (len >= STRIDE_LENGTH)
+	if (len >= ASCII_CHECK_LEN)
 	{
-		while (len >= STRIDE_LENGTH)
+		while (len >= ASCII_CHECK_LEN)
 		{
 			/*
 			 * If the chunk is all ASCII, we can skip the full UTF-8 check,
 			 * but we must first check for a non-END state, which means the
 			 * previous chunk ended in the middle of a multibyte sequence.
 			 */
-			if (state != END || !is_valid_ascii(s, STRIDE_LENGTH))
-				utf8_advance(s, &state, STRIDE_LENGTH);
+			if (state != END || !is_valid_ascii(s, ASCII_CHECK_LEN))
+				utf8_advance(s, &state, ASCII_CHECK_LEN);
 
-			s += STRIDE_LENGTH;
-			len -= STRIDE_LENGTH;
+			s += ASCII_CHECK_LEN;
+			len -= ASCII_CHECK_LEN;
 		}
 
 		/* The error state persists, so we only need to check for it here. */
diff --git a/src/include/mb/pg_wchar.h b/src/include/mb/pg_wchar.h
index 011b0b3abd..cbb1dfe978 100644
--- a/src/include/mb/pg_wchar.h
+++ b/src/include/mb/pg_wchar.h
@@ -19,6 +19,8 @@
 #ifndef PG_WCHAR_H
 #define PG_WCHAR_H
 
+#include "port/simd.h"
+
 /*
  * The pg_wchar type
  */
@@ -704,25 +706,57 @@ extern WCHAR *pgwin32_message_to_UTF16(const char *str, int len, int *utf16len);
  * Verify a chunk of bytes for valid ASCII.
  *
  * Returns false if the input contains any zero bytes or bytes with the
- * high-bit set. Input len must be a multiple of 8.
+ * high-bit set. Input len must be a multiple of the chunk size (8 or 16).
+ * Since the chunk size is platform-specific, we provide the ASCII_CHECK_LEN
+ * convenience macro for callers to pass for len.
  */
 static inline bool
 is_valid_ascii(const unsigned char *s, int len)
 {
 	const unsigned char *const s_end = s + len;
+
+#ifdef USE_SSE2
+	__m128i		chunk,
+				error_cum = _mm_setzero_si128();
+#else
 	uint64		chunk,
 				highbit_cum = UINT64CONST(0),
 				zero_cum = UINT64CONST(0x8080808080808080);
+#endif
+
+	/*
+	 * With two chunks, gcc can unroll the loop. Even if the compiler can
+	 * unroll a longer loop, it's not worth it because callers might have to
+	 * use a byte-wise algorithm if we return false.
+	 */
+#ifdef USE_SSE2
+#define ASCII_CHECK_LEN (2 * sizeof(__m128i))
+#else
+#define ASCII_CHECK_LEN (2 * sizeof(uint64))
+#endif
 
 	Assert(len % sizeof(chunk) == 0);
 
 	while (s < s_end)
 	{
+#ifdef USE_SSE2
+		chunk = _mm_loadu_si128((const __m128i *) s);
+#else
 		memcpy(&chunk, s, sizeof(chunk));
+#endif
+
+		/* Capture any zero bytes in this chunk. */
+#ifdef USE_SSE2
+
+		/*
+		 * Set all bits in each lane of the error accumulator where input
+		 * bytes are zero.
+		 */
+		error_cum = _mm_or_si128(error_cum,
+								 _mm_cmpeq_epi8(chunk, _mm_setzero_si128()));
+#else
 
 		/*
-		 * Capture any zero bytes in this chunk.
-		 *
 		 * First, add 0x7f to each byte. This sets the high bit in each byte,
 		 * unless it was a zero. If any resulting high bits are zero, the
 		 * corresponding high bits in the zero accumulator will be cleared.
@@ -734,13 +768,22 @@ is_valid_ascii(const unsigned char *s, int len)
 		 * because we check for those separately.
 		 */
 		zero_cum &= (chunk + UINT64CONST(0x7f7f7f7f7f7f7f7f));
+#endif
 
 		/* Capture all set bits in this chunk. */
+#ifdef USE_SSE2
+		error_cum = _mm_or_si128(error_cum, chunk);
+#else
 		highbit_cum |= chunk;
+#endif
 
 		s += sizeof(chunk);
 	}
 
+#ifdef USE_SSE2
+	/* Check if any lanes in the error accumulator got set. */
+	return _mm_movemask_epi8(error_cum) == 0;
+#else
 	/* Check if any high bits in the high bit accumulator got set. */
 	if (highbit_cum & UINT64CONST(0x8080808080808080))
 		return false;
@@ -750,6 +793,7 @@ is_valid_ascii(const unsigned char *s, int len)
 		return false;
 
 	return true;
+#endif
 }
 
 #endif							/* PG_WCHAR_H */
diff --git a/src/test/regress/expected/conversion.out b/src/test/regress/expected/conversion.out
index 442e7aff2b..434dc4d93c 100644
--- a/src/test/regress/expected/conversion.out
+++ b/src/test/regress/expected/conversion.out
@@ -140,7 +140,8 @@ select description, (test_conv(inbytes, 'utf8', 'utf8')).* from utf8_verificatio
 -- will contain all 4 bytes if they are present, so various
 -- expressions below add 3 ASCII bytes to the end to ensure
 -- consistent error messages.
--- The number 64 below needs to be at least the value of STRIDE_LENGTH in wchar.c.
+-- The number 64 below needs to equal or a multiple of the largest
+-- possible value of ASCII_CHECK_LEN in mb/pg_wchar.h.
 -- Test multibyte verification in fast path
 with test_bytes as (
   select
diff --git a/src/test/regress/sql/conversion.sql b/src/test/regress/sql/conversion.sql
index 9a65fca91f..27ef069eaf 100644
--- a/src/test/regress/sql/conversion.sql
+++ b/src/test/regress/sql/conversion.sql
@@ -121,7 +121,8 @@ select description, (test_conv(inbytes, 'utf8', 'utf8')).* from utf8_verificatio
 -- will contain all 4 bytes if they are present, so various
 -- expressions below add 3 ASCII bytes to the end to ensure
 -- consistent error messages.
--- The number 64 below needs to be at least the value of STRIDE_LENGTH in wchar.c.
+-- The number 64 below needs to equal or a multiple of the largest
+-- possible value of ASCII_CHECK_LEN in mb/pg_wchar.h.
 
 -- Test multibyte verification in fast path
 with test_bytes as (
-- 
2.36.1

Reply via email to