new thread [was: WIP Patch: Add a function that returns binary JSONB as a bytea]
> I wrote: > > We can also shave a > > few percent by having pg_utf8_verifystr use SSE2 for the ascii path. I > > can look into this. > > Here's a patch for that. If the input is mostly ascii, I'd expect that > part of the flame graph to shrink by 40-50% and give a small boost > overall. Here is an updated patch using the new USE_SSE2 symbol. The style is different from the last one in that each stanza has platform-specific code. I wanted to try it this way because is_valid_ascii() is already written in SIMD-ish style using general purpose registers and bit twiddling, so it seemed natural to see the two side-by-side. Sometimes they can share the same comment. If we think this is bad for readability, I can go back to one block each, but that way leads to duplication of code and it's difficult to see what's different for each platform, IMO. -- John Naylor EDB: http://www.enterprisedb.com
From 69d56a21192ed2f03bc08f078cfff7ba5cb0d80b Mon Sep 17 00:00:00 2001 From: John Naylor <john.naylor@postgresql.org> Date: Fri, 5 Aug 2022 13:30:47 +0700 Subject: [PATCH v2] Use SSE2 in is_valid_ascii where available. This is ~40% faster than portable C on contemporary Intel hardware. --- src/common/wchar.c | 18 +++------ src/include/mb/pg_wchar.h | 50 ++++++++++++++++++++++-- src/test/regress/expected/conversion.out | 3 +- src/test/regress/sql/conversion.sql | 3 +- 4 files changed, 57 insertions(+), 17 deletions(-) diff --git a/src/common/wchar.c b/src/common/wchar.c index 1e6e198bf2..a305e0e66b 100644 --- a/src/common/wchar.c +++ b/src/common/wchar.c @@ -1918,26 +1918,20 @@ pg_utf8_verifystr(const unsigned char *s, int len) const int orig_len = len; uint32 state = BGN; -/* - * Sixteen seems to give the best balance of performance across different - * byte distributions. - */ -#define STRIDE_LENGTH 16 - - if (len >= STRIDE_LENGTH) + if (len >= ASCII_CHECK_LEN) { - while (len >= STRIDE_LENGTH) + while (len >= ASCII_CHECK_LEN) { /* * If the chunk is all ASCII, we can skip the full UTF-8 check, * but we must first check for a non-END state, which means the * previous chunk ended in the middle of a multibyte sequence. */ - if (state != END || !is_valid_ascii(s, STRIDE_LENGTH)) - utf8_advance(s, &state, STRIDE_LENGTH); + if (state != END || !is_valid_ascii(s, ASCII_CHECK_LEN)) + utf8_advance(s, &state, ASCII_CHECK_LEN); - s += STRIDE_LENGTH; - len -= STRIDE_LENGTH; + s += ASCII_CHECK_LEN; + len -= ASCII_CHECK_LEN; } /* The error state persists, so we only need to check for it here. */ diff --git a/src/include/mb/pg_wchar.h b/src/include/mb/pg_wchar.h index 011b0b3abd..cbb1dfe978 100644 --- a/src/include/mb/pg_wchar.h +++ b/src/include/mb/pg_wchar.h @@ -19,6 +19,8 @@ #ifndef PG_WCHAR_H #define PG_WCHAR_H +#include "port/simd.h" + /* * The pg_wchar type */ @@ -704,25 +706,57 @@ extern WCHAR *pgwin32_message_to_UTF16(const char *str, int len, int *utf16len); * Verify a chunk of bytes for valid ASCII. * * Returns false if the input contains any zero bytes or bytes with the - * high-bit set. Input len must be a multiple of 8. + * high-bit set. Input len must be a multiple of the chunk size (8 or 16). + * Since the chunk size is platform-specific, we provide the ASCII_CHECK_LEN + * convenience macro for callers to pass for len. */ static inline bool is_valid_ascii(const unsigned char *s, int len) { const unsigned char *const s_end = s + len; + +#ifdef USE_SSE2 + __m128i chunk, + error_cum = _mm_setzero_si128(); +#else uint64 chunk, highbit_cum = UINT64CONST(0), zero_cum = UINT64CONST(0x8080808080808080); +#endif + + /* + * With two chunks, gcc can unroll the loop. Even if the compiler can + * unroll a longer loop, it's not worth it because callers might have to + * use a byte-wise algorithm if we return false. + */ +#ifdef USE_SSE2 +#define ASCII_CHECK_LEN (2 * sizeof(__m128i)) +#else +#define ASCII_CHECK_LEN (2 * sizeof(uint64)) +#endif Assert(len % sizeof(chunk) == 0); while (s < s_end) { +#ifdef USE_SSE2 + chunk = _mm_loadu_si128((const __m128i *) s); +#else memcpy(&chunk, s, sizeof(chunk)); +#endif + + /* Capture any zero bytes in this chunk. */ +#ifdef USE_SSE2 + + /* + * Set all bits in each lane of the error accumulator where input + * bytes are zero. + */ + error_cum = _mm_or_si128(error_cum, + _mm_cmpeq_epi8(chunk, _mm_setzero_si128())); +#else /* - * Capture any zero bytes in this chunk. - * * First, add 0x7f to each byte. This sets the high bit in each byte, * unless it was a zero. If any resulting high bits are zero, the * corresponding high bits in the zero accumulator will be cleared. @@ -734,13 +768,22 @@ is_valid_ascii(const unsigned char *s, int len) * because we check for those separately. */ zero_cum &= (chunk + UINT64CONST(0x7f7f7f7f7f7f7f7f)); +#endif /* Capture all set bits in this chunk. */ +#ifdef USE_SSE2 + error_cum = _mm_or_si128(error_cum, chunk); +#else highbit_cum |= chunk; +#endif s += sizeof(chunk); } +#ifdef USE_SSE2 + /* Check if any lanes in the error accumulator got set. */ + return _mm_movemask_epi8(error_cum) == 0; +#else /* Check if any high bits in the high bit accumulator got set. */ if (highbit_cum & UINT64CONST(0x8080808080808080)) return false; @@ -750,6 +793,7 @@ is_valid_ascii(const unsigned char *s, int len) return false; return true; +#endif } #endif /* PG_WCHAR_H */ diff --git a/src/test/regress/expected/conversion.out b/src/test/regress/expected/conversion.out index 442e7aff2b..434dc4d93c 100644 --- a/src/test/regress/expected/conversion.out +++ b/src/test/regress/expected/conversion.out @@ -140,7 +140,8 @@ select description, (test_conv(inbytes, 'utf8', 'utf8')).* from utf8_verificatio -- will contain all 4 bytes if they are present, so various -- expressions below add 3 ASCII bytes to the end to ensure -- consistent error messages. --- The number 64 below needs to be at least the value of STRIDE_LENGTH in wchar.c. +-- The number 64 below needs to equal or a multiple of the largest +-- possible value of ASCII_CHECK_LEN in mb/pg_wchar.h. -- Test multibyte verification in fast path with test_bytes as ( select diff --git a/src/test/regress/sql/conversion.sql b/src/test/regress/sql/conversion.sql index 9a65fca91f..27ef069eaf 100644 --- a/src/test/regress/sql/conversion.sql +++ b/src/test/regress/sql/conversion.sql @@ -121,7 +121,8 @@ select description, (test_conv(inbytes, 'utf8', 'utf8')).* from utf8_verificatio -- will contain all 4 bytes if they are present, so various -- expressions below add 3 ASCII bytes to the end to ensure -- consistent error messages. --- The number 64 below needs to be at least the value of STRIDE_LENGTH in wchar.c. +-- The number 64 below needs to equal or a multiple of the largest +-- possible value of ASCII_CHECK_LEN in mb/pg_wchar.h. -- Test multibyte verification in fast path with test_bytes as ( -- 2.36.1