Re: speed up verifying UTF-8

John Naylor Thu, 26 Aug 2021 08:36:22 -0700

I wrote:

> Naively, the shift-based DFA requires 64-bit integers to encode the
transitions, but I recently came across an idea from Dougall Johnson of
using the Z3 SMT solver to pack the transitions into 32-bit integers [1].
That halves the size of the transition table for free. I adapted that
effort to the existing conventions in v22 and arrived at the attached
python script.
> [...]
> I'll include something like the attached text file diff in the next
patch. Some comments are now outdated, but this is good enough for
demonstration.


Attached is v23 incorporating the 32-bit transition table, with the
necessary comment adjustments.

--
John Naylor
EDB: http://www.enterprisedb.com

From d62f21bf7256d283c83c81b07e49fc33e270e4e3 Mon Sep 17 00:00:00 2001
From: John Naylor <john.naylor@2ndquadrant.com>
Date: Wed, 28 Jul 2021 13:53:06 -0400
Subject: [PATCH v23] Add fast paths for validating UTF-8 text

Our previous validator is a traditional one that performs comparisons
and branching on one byte at a time. It's useful in that we always know
exactly how many bytes we have validated, but that precision comes at
a cost. Input validation can show up prominently in profiles of COPY
FROM, and future improvements to COPY FROM such as parallelism or
faster line parsing will put more pressure on input validation. Hence,
supplement with two fast paths, depending on platform:

On machines that support SSE4, use an algorithm described in the
paper "Validating UTF-8 In Less Than One Instruction Per Byte" by
John Keiser and Daniel Lemire. The authors have made available an
open source implementation within the simdjson library (Apache 2.0
license). The lookup tables and some naming conventions were adopted
from this library, but the code was written from scratch.

On other hardware, use a "shift-based" DFA.

Both implementations are highly optimized for blocks of ASCII text,
are relatively free of branches and thus robust against all kinds of
byte patterns, and delay error checking to the very end. With these
algorithms, UTF-8 validation is from anywhere from two to seven times
faster, depending on platform and the distribution of byte sequences
in the input.

The previous coding in pg_utf8_verifystr() is retained for short
strings and for when the fast path returns an error.

Review, performance testing, and additional hacking by: Heikki
Linakangas, Vladimir Sitnikov, Amit Khandekar, Thomas Munro, and
Greg Stark

Discussion:
https://www.postgresql.org/message-id/CAFBsxsEV_SzH%2BOLyCiyon%3DiwggSyMh_eF6A3LU2tiWf3Cy2ZQg%40mail.gmail.com
---
 config/c-compiler.m4                     |  28 +-
 configure                                | 112 ++++++--
 configure.ac                             |  61 +++-
 src/Makefile.global.in                   |   3 +
 src/common/wchar.c                       |  35 +++
 src/include/mb/pg_wchar.h                |   7 +
 src/include/pg_config.h.in               |   9 +
 src/include/port/pg_sse42_utils.h        | 163 +++++++++++
 src/include/port/pg_utf8.h               | 103 +++++++
 src/port/Makefile                        |   6 +
 src/port/pg_utf8_fallback.c              | 231 +++++++++++++++
 src/port/pg_utf8_sse42.c                 | 349 +++++++++++++++++++++++
 src/port/pg_utf8_sse42_choose.c          |  68 +++++
 src/test/regress/expected/conversion.out | 170 +++++++++++
 src/test/regress/sql/conversion.sql      | 134 +++++++++
 src/tools/msvc/Mkvcbuild.pm              |   4 +
 src/tools/msvc/Solution.pm               |   3 +
 17 files changed, 1442 insertions(+), 44 deletions(-)
 create mode 100644 src/include/port/pg_sse42_utils.h
 create mode 100644 src/include/port/pg_utf8.h
 create mode 100644 src/port/pg_utf8_fallback.c
 create mode 100644 src/port/pg_utf8_sse42.c
 create mode 100644 src/port/pg_utf8_sse42_choose.c

diff --git a/config/c-compiler.m4 b/config/c-compiler.m4
index 780e906ecc..49d592a53c 100644
--- a/config/c-compiler.m4
+++ b/config/c-compiler.m4
@@ -591,36 +591,46 @@ if test x"$pgac_cv_gcc_atomic_int64_cas" = x"yes"; then
   AC_DEFINE(HAVE_GCC__ATOMIC_INT64_CAS, 1, [Define to 1 if you have __atomic_compare_exchange_n(int64 *, int64 *, int64).])
 fi])# PGAC_HAVE_GCC__ATOMIC_INT64_CAS
 
-# PGAC_SSE42_CRC32_INTRINSICS
+# PGAC_SSE42_INTRINSICS
 # ---------------------------
 # Check if the compiler supports the x86 CRC instructions added in SSE 4.2,
 # using the _mm_crc32_u8 and _mm_crc32_u32 intrinsic functions. (We don't
 # test the 8-byte variant, _mm_crc32_u64, but it is assumed to be present if
 # the other ones are, on x86-64 platforms)
 #
+# Also, check for support of x86 instructions added in SSSE3 and SSE4.1,
+# in particular _mm_alignr_epi8, _mm_shuffle_epi8, and _mm_testz_si128.
+# We might be able to assume these are understood by the compiler if CRC
+# intrinsics are, but it's better to document our reliance on them here.
+#
+# We don't test for SSE2 intrinsics, as they are assumed to be present if
+# SSE 4.2 intrinsics are.
+#
 # An optional compiler flag can be passed as argument (e.g. -msse4.2). If the
-# intrinsics are supported, sets pgac_sse42_crc32_intrinsics, and CFLAGS_SSE42.
-AC_DEFUN([PGAC_SSE42_CRC32_INTRINSICS],
-[define([Ac_cachevar], [AS_TR_SH([pgac_cv_sse42_crc32_intrinsics_$1])])dnl
-AC_CACHE_CHECK([for _mm_crc32_u8 and _mm_crc32_u32 with CFLAGS=$1], [Ac_cachevar],
+# intrinsics are supported, sets pgac_sse42_intrinsics, and CFLAGS_SSE42.
+AC_DEFUN([PGAC_SSE42_INTRINSICS],
+[define([Ac_cachevar], [AS_TR_SH([pgac_cv_sse42_intrinsics_$1])])dnl
+AC_CACHE_CHECK([for for _mm_crc32_u8, _mm_crc32_u32, _mm_alignr_epi8, _mm_shuffle_epi8, and _mm_testz_si128 with CFLAGS=$1], [Ac_cachevar],
 [pgac_save_CFLAGS=$CFLAGS
 CFLAGS="$pgac_save_CFLAGS $1"
 AC_LINK_IFELSE([AC_LANG_PROGRAM([#include <nmmintrin.h>],
   [unsigned int crc = 0;
    crc = _mm_crc32_u8(crc, 0);
    crc = _mm_crc32_u32(crc, 0);
+   __m128i vec = _mm_set1_epi8(crc);
+   vec = _mm_shuffle_epi8(vec,
+         _mm_alignr_epi8(vec, vec, 1));
    /* return computed value, to prevent the above being optimized away */
-   return crc == 0;])],
+   return _mm_testz_si128(vec, vec);])],
   [Ac_cachevar=yes],
   [Ac_cachevar=no])
 CFLAGS="$pgac_save_CFLAGS"])
 if test x"$Ac_cachevar" = x"yes"; then
   CFLAGS_SSE42="$1"
-  pgac_sse42_crc32_intrinsics=yes
+  pgac_sse42_intrinsics=yes
 fi
 undefine([Ac_cachevar])dnl
-])# PGAC_SSE42_CRC32_INTRINSICS
-
+])# PGAC_SSE42_INTRINSICS
 
 # PGAC_ARMV8_CRC32C_INTRINSICS
 # ----------------------------
diff --git a/configure b/configure
index 7542fe30a1..fec872fad6 100755
--- a/configure
+++ b/configure
@@ -645,6 +645,7 @@ XGETTEXT
 MSGMERGE
 MSGFMT_FLAGS
 MSGFMT
+PG_UTF8_OBJS
 PG_CRC32C_OBJS
 CFLAGS_ARMV8_CRC32C
 CFLAGS_SSE42
@@ -18079,14 +18080,14 @@ $as_echo "#define HAVE__CPUID 1" >>confdefs.h
 
 fi
 
-# Check for Intel SSE 4.2 intrinsics to do CRC calculations.
+# Check for Intel SSE 4.2 intrinsics.
 #
-# First check if the _mm_crc32_u8 and _mm_crc32_u64 intrinsics can be used
+# First check if these intrinsics can be used
 # with the default compiler flags. If not, check if adding the -msse4.2
 # flag helps. CFLAGS_SSE42 is set to -msse4.2 if that's required.
-{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for _mm_crc32_u8 and _mm_crc32_u32 with CFLAGS=" >&5
-$as_echo_n "checking for _mm_crc32_u8 and _mm_crc32_u32 with CFLAGS=... " >&6; }
-if ${pgac_cv_sse42_crc32_intrinsics_+:} false; then :
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for for _mm_crc32_u8, _mm_crc32_u32, _mm_alignr_epi8, _mm_shuffle_epi8, and _mm_testz_si128 with CFLAGS=" >&5
+$as_echo_n "checking for for _mm_crc32_u8, _mm_crc32_u32, _mm_alignr_epi8, _mm_shuffle_epi8, and _mm_testz_si128 with CFLAGS=... " >&6; }
+if ${pgac_cv_sse42_intrinsics_+:} false; then :
   $as_echo_n "(cached) " >&6
 else
   pgac_save_CFLAGS=$CFLAGS
@@ -18100,32 +18101,35 @@ main ()
 unsigned int crc = 0;
    crc = _mm_crc32_u8(crc, 0);
    crc = _mm_crc32_u32(crc, 0);
+   __m128i vec = _mm_set1_epi8(crc);
+   vec = _mm_shuffle_epi8(vec,
+         _mm_alignr_epi8(vec, vec, 1));
    /* return computed value, to prevent the above being optimized away */
-   return crc == 0;
+   return _mm_testz_si128(vec, vec);
   ;
   return 0;
 }
 _ACEOF
 if ac_fn_c_try_link "$LINENO"; then :
-  pgac_cv_sse42_crc32_intrinsics_=yes
+  pgac_cv_sse42_intrinsics_=yes
 else
-  pgac_cv_sse42_crc32_intrinsics_=no
+  pgac_cv_sse42_intrinsics_=no
 fi
 rm -f core conftest.err conftest.$ac_objext \
     conftest$ac_exeext conftest.$ac_ext
 CFLAGS="$pgac_save_CFLAGS"
 fi
-{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_sse42_crc32_intrinsics_" >&5
-$as_echo "$pgac_cv_sse42_crc32_intrinsics_" >&6; }
-if test x"$pgac_cv_sse42_crc32_intrinsics_" = x"yes"; then
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_sse42_intrinsics_" >&5
+$as_echo "$pgac_cv_sse42_intrinsics_" >&6; }
+if test x"$pgac_cv_sse42_intrinsics_" = x"yes"; then
   CFLAGS_SSE42=""
-  pgac_sse42_crc32_intrinsics=yes
+  pgac_sse42_intrinsics=yes
 fi
 
-if test x"$pgac_sse42_crc32_intrinsics" != x"yes"; then
-  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for _mm_crc32_u8 and _mm_crc32_u32 with CFLAGS=-msse4.2" >&5
-$as_echo_n "checking for _mm_crc32_u8 and _mm_crc32_u32 with CFLAGS=-msse4.2... " >&6; }
-if ${pgac_cv_sse42_crc32_intrinsics__msse4_2+:} false; then :
+if test x"$pgac_sse42_intrinsics" != x"yes"; then
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for for _mm_crc32_u8, _mm_crc32_u32, _mm_alignr_epi8, _mm_shuffle_epi8, and _mm_testz_si128 with CFLAGS=-msse4.2" >&5
+$as_echo_n "checking for for _mm_crc32_u8, _mm_crc32_u32, _mm_alignr_epi8, _mm_shuffle_epi8, and _mm_testz_si128 with CFLAGS=-msse4.2... " >&6; }
+if ${pgac_cv_sse42_intrinsics__msse4_2+:} false; then :
   $as_echo_n "(cached) " >&6
 else
   pgac_save_CFLAGS=$CFLAGS
@@ -18139,26 +18143,29 @@ main ()
 unsigned int crc = 0;
    crc = _mm_crc32_u8(crc, 0);
    crc = _mm_crc32_u32(crc, 0);
+   __m128i vec = _mm_set1_epi8(crc);
+   vec = _mm_shuffle_epi8(vec,
+         _mm_alignr_epi8(vec, vec, 1));
    /* return computed value, to prevent the above being optimized away */
-   return crc == 0;
+   return _mm_testz_si128(vec, vec);
   ;
   return 0;
 }
 _ACEOF
 if ac_fn_c_try_link "$LINENO"; then :
-  pgac_cv_sse42_crc32_intrinsics__msse4_2=yes
+  pgac_cv_sse42_intrinsics__msse4_2=yes
 else
-  pgac_cv_sse42_crc32_intrinsics__msse4_2=no
+  pgac_cv_sse42_intrinsics__msse4_2=no
 fi
 rm -f core conftest.err conftest.$ac_objext \
     conftest$ac_exeext conftest.$ac_ext
 CFLAGS="$pgac_save_CFLAGS"
 fi
-{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_sse42_crc32_intrinsics__msse4_2" >&5
-$as_echo "$pgac_cv_sse42_crc32_intrinsics__msse4_2" >&6; }
-if test x"$pgac_cv_sse42_crc32_intrinsics__msse4_2" = x"yes"; then
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_sse42_intrinsics__msse4_2" >&5
+$as_echo "$pgac_cv_sse42_intrinsics__msse4_2" >&6; }
+if test x"$pgac_cv_sse42_intrinsics__msse4_2" = x"yes"; then
   CFLAGS_SSE42="-msse4.2"
-  pgac_sse42_crc32_intrinsics=yes
+  pgac_sse42_intrinsics=yes
 fi
 
 fi
@@ -18293,12 +18300,12 @@ fi
 # in the template or configure command line.
 if test x"$USE_SLICING_BY_8_CRC32C" = x"" && test x"$USE_SSE42_CRC32C" = x"" && test x"$USE_SSE42_CRC32C_WITH_RUNTIME_CHECK" = x"" && test x"$USE_ARMV8_CRC32C" = x"" && test x"$USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK" = x""; then
   # Use Intel SSE 4.2 if available.
-  if test x"$pgac_sse42_crc32_intrinsics" = x"yes" && test x"$SSE4_2_TARGETED" = x"1" ; then
+  if test x"$pgac_sse42_intrinsics" = x"yes" && test x"$SSE4_2_TARGETED" = x"1" ; then
     USE_SSE42_CRC32C=1
   else
     # Intel SSE 4.2, with runtime check? The CPUID instruction is needed for
     # the runtime check.
-    if test x"$pgac_sse42_crc32_intrinsics" = x"yes" && (test x"$pgac_cv__get_cpuid" = x"yes" || test x"$pgac_cv__cpuid" = x"yes"); then
+    if test x"$pgac_sse42_intrinsics" = x"yes" && (test x"$pgac_cv__get_cpuid" = x"yes" || test x"$pgac_cv__cpuid" = x"yes"); then
       USE_SSE42_CRC32C_WITH_RUNTIME_CHECK=1
     else
       # Use ARM CRC Extension if available.
@@ -18312,7 +18319,7 @@ if test x"$USE_SLICING_BY_8_CRC32C" = x"" && test x"$USE_SSE42_CRC32C" = x"" &&
           # fall back to slicing-by-8 algorithm, which doesn't require any
           # special CPU support.
           USE_SLICING_BY_8_CRC32C=1
-	fi
+        fi
       fi
     fi
   fi
@@ -18365,6 +18372,59 @@ $as_echo "slicing-by-8" >&6; }
 fi
 
 
+# Select UTF-8 validator implementation.
+#
+# If we are targeting a processor that has SSE 4.2 instructions, we can use
+# those to validate UTF-8 characters. If we're not targeting such
+# a processor, but we can nevertheless produce code that uses the SSE
+# intrinsics, perhaps with some extra CFLAGS, compile both implementations and
+# select which one to use at runtime, depending on whether SSE 4.2 is supported
+# by the processor we're running on.
+#
+# You can override this logic by setting the appropriate USE_*_UTF8 flag to 1
+# in the template or configure command line.
+if test x"$USE_SSE42_UTF8" = x"" && test x"$USE_SSE42_UTF8_WITH_RUNTIME_CHECK" = x"" && test x"$USE_FALLBACK_UTF8" = x""; then
+  if test x"$pgac_sse42_intrinsics" = x"yes" && test x"$SSE4_2_TARGETED" = x"1" ; then
+    USE_SSE42_UTF8=1
+  else
+    if test x"$pgac_sse42_intrinsics" = x"yes"; then
+      USE_SSE42_UTF8_WITH_RUNTIME_CHECK=1
+    else
+      # fall back to algorithm which doesn't require any special
+      # CPU support.
+      USE_FALLBACK_UTF8=1
+    fi
+  fi
+fi
+
+# Set PG_UTF8_OBJS appropriately depending on the selected implementation.
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking which UTF-8 validator to use" >&5
+$as_echo_n "checking which UTF-8 validator to use... " >&6; }
+if test x"$USE_SSE42_UTF8" = x"1"; then
+
+$as_echo "#define USE_SSE42_UTF8 1" >>confdefs.h
+
+  PG_UTF8_OBJS="pg_utf8_sse42.o"
+  { $as_echo "$as_me:${as_lineno-$LINENO}: result: SSE 4.2" >&5
+$as_echo "SSE 4.2" >&6; }
+else
+  if test x"$USE_SSE42_UTF8_WITH_RUNTIME_CHECK" = x"1"; then
+
+$as_echo "#define USE_SSE42_UTF8_WITH_RUNTIME_CHECK 1" >>confdefs.h
+
+    PG_UTF8_OBJS="pg_utf8_sse42.o pg_utf8_fallback.o pg_utf8_sse42_choose.o"
+    { $as_echo "$as_me:${as_lineno-$LINENO}: result: SSE 4.2 with runtime check" >&5
+$as_echo "SSE 4.2 with runtime check" >&6; }
+  else
+
+$as_echo "#define USE_FALLBACK_UTF8 1" >>confdefs.h
+
+    PG_UTF8_OBJS="pg_utf8_fallback.o"
+    { $as_echo "$as_me:${as_lineno-$LINENO}: result: fallback" >&5
+$as_echo "fallback" >&6; }
+  fi
+fi
+
 
 # Select semaphore implementation type.
 if test "$PORTNAME" != "win32"; then
diff --git a/configure.ac b/configure.ac
index ed3cdb9a8e..cbd2d9a27b 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2098,14 +2098,14 @@ if test x"$pgac_cv__cpuid" = x"yes"; then
   AC_DEFINE(HAVE__CPUID, 1, [Define to 1 if you have __cpuid.])
 fi
 
-# Check for Intel SSE 4.2 intrinsics to do CRC calculations.
+# Check for Intel SSE 4.2 intrinsics.
 #
-# First check if the _mm_crc32_u8 and _mm_crc32_u64 intrinsics can be used
-# with the default compiler flags. If not, check if adding the -msse4.2
+# First check if these intrinsics can be used with the default
+# compiler flags. If not, check if adding the -msse4.2
 # flag helps. CFLAGS_SSE42 is set to -msse4.2 if that's required.
-PGAC_SSE42_CRC32_INTRINSICS([])
-if test x"$pgac_sse42_crc32_intrinsics" != x"yes"; then
-  PGAC_SSE42_CRC32_INTRINSICS([-msse4.2])
+PGAC_SSE42_INTRINSICS([])
+if test x"$pgac_sse42_intrinsics" != x"yes"; then
+  PGAC_SSE42_INTRINSICS([-msse4.2])
 fi
 AC_SUBST(CFLAGS_SSE42)
 
@@ -2146,12 +2146,12 @@ AC_SUBST(CFLAGS_ARMV8_CRC32C)
 # in the template or configure command line.
 if test x"$USE_SLICING_BY_8_CRC32C" = x"" && test x"$USE_SSE42_CRC32C" = x"" && test x"$USE_SSE42_CRC32C_WITH_RUNTIME_CHECK" = x"" && test x"$USE_ARMV8_CRC32C" = x"" && test x"$USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK" = x""; then
   # Use Intel SSE 4.2 if available.
-  if test x"$pgac_sse42_crc32_intrinsics" = x"yes" && test x"$SSE4_2_TARGETED" = x"1" ; then
+  if test x"$pgac_sse42_intrinsics" = x"yes" && test x"$SSE4_2_TARGETED" = x"1" ; then
     USE_SSE42_CRC32C=1
   else
     # Intel SSE 4.2, with runtime check? The CPUID instruction is needed for
     # the runtime check.
-    if test x"$pgac_sse42_crc32_intrinsics" = x"yes" && (test x"$pgac_cv__get_cpuid" = x"yes" || test x"$pgac_cv__cpuid" = x"yes"); then
+    if test x"$pgac_sse42_intrinsics" = x"yes" && (test x"$pgac_cv__get_cpuid" = x"yes" || test x"$pgac_cv__cpuid" = x"yes"); then
       USE_SSE42_CRC32C_WITH_RUNTIME_CHECK=1
     else
       # Use ARM CRC Extension if available.
@@ -2165,7 +2165,7 @@ if test x"$USE_SLICING_BY_8_CRC32C" = x"" && test x"$USE_SSE42_CRC32C" = x"" &&
           # fall back to slicing-by-8 algorithm, which doesn't require any
           # special CPU support.
           USE_SLICING_BY_8_CRC32C=1
-	fi
+        fi
       fi
     fi
   fi
@@ -2202,6 +2202,49 @@ else
 fi
 AC_SUBST(PG_CRC32C_OBJS)
 
+# Select UTF-8 validator implementation.
+#
+# If we are targeting a processor that has SSE 4.2 instructions, we can use
+# those to validate UTF-8 characters. If we're not targeting such
+# a processor, but we can nevertheless produce code that uses the SSE
+# intrinsics, perhaps with some extra CFLAGS, compile both implementations and
+# select which one to use at runtime, depending on whether SSE 4.2 is supported
+# by the processor we're running on.
+#
+# You can override this logic by setting the appropriate USE_*_UTF8 flag to 1
+# in the template or configure command line.
+if test x"$USE_SSE42_UTF8" = x"" && test x"$USE_SSE42_UTF8_WITH_RUNTIME_CHECK" = x"" && test x"$USE_FALLBACK_UTF8" = x""; then
+  if test x"$pgac_sse42_intrinsics" = x"yes" && test x"$SSE4_2_TARGETED" = x"1" ; then
+    USE_SSE42_UTF8=1
+  else
+    if test x"$pgac_sse42_intrinsics" = x"yes"; then
+      USE_SSE42_UTF8_WITH_RUNTIME_CHECK=1
+    else
+      # fall back to algorithm which doesn't require any special
+      # CPU support.
+      USE_FALLBACK_UTF8=1
+    fi
+  fi
+fi
+
+# Set PG_UTF8_OBJS appropriately depending on the selected implementation.
+AC_MSG_CHECKING([which UTF-8 validator to use])
+if test x"$USE_SSE42_UTF8" = x"1"; then
+  AC_DEFINE(USE_SSE42_UTF8, 1, [Define to 1 use Intel SSE 4.2 instructions.])
+  PG_UTF8_OBJS="pg_utf8_sse42.o"
+  AC_MSG_RESULT(SSE 4.2)
+else
+  if test x"$USE_SSE42_UTF8_WITH_RUNTIME_CHECK" = x"1"; then
+    AC_DEFINE(USE_SSE42_UTF8_WITH_RUNTIME_CHECK, 1, [Define to 1 to use Intel SSE 4.2 instructions with a runtime check.])
+    PG_UTF8_OBJS="pg_utf8_sse42.o pg_utf8_fallback.o pg_utf8_sse42_choose.o"
+    AC_MSG_RESULT(SSE 4.2 with runtime check)
+  else
+    AC_DEFINE(USE_FALLBACK_UTF8, 1, [Define to 1 to use the fallback.])
+    PG_UTF8_OBJS="pg_utf8_fallback.o"
+    AC_MSG_RESULT(fallback)
+  fi
+fi
+AC_SUBST(PG_UTF8_OBJS)
 
 # Select semaphore implementation type.
 if test "$PORTNAME" != "win32"; then
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 6e2f224cc4..a23441b36d 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -740,6 +740,9 @@ LIBOBJS = @LIBOBJS@
 # files needed for the chosen CRC-32C implementation
 PG_CRC32C_OBJS = @PG_CRC32C_OBJS@
 
+# files needed for the chosen UTF-8 validation implementation
+PG_UTF8_OBJS = @PG_UTF8_OBJS@
+
 LIBS := -lpgcommon -lpgport $(LIBS)
 
 # to make ws2_32.lib the last library
diff --git a/src/common/wchar.c b/src/common/wchar.c
index 0636b8765b..586bfee7cc 100644
--- a/src/common/wchar.c
+++ b/src/common/wchar.c
@@ -13,6 +13,7 @@
 #include "c.h"
 
 #include "mb/pg_wchar.h"
+#include "port/pg_utf8.h"
 
 
 /*
@@ -1761,7 +1762,41 @@ static int
 pg_utf8_verifystr(const unsigned char *s, int len)
 {
 	const unsigned char *start = s;
+	int			non_error_bytes;
 
+	/*
+	 * For all but the shortest strings, dispatch to an optimized
+	 * platform-specific implementation in src/port. The threshold is set to
+	 * the width of the widest SIMD register we support in
+	 * src/include/port/pg_sse42_utils.h.
+	 */
+	if (len >= 16)
+	{
+		non_error_bytes = UTF8_VERIFYSTR_FAST(s, len);
+		s += non_error_bytes;
+		len -= non_error_bytes;
+
+		/*
+		 * The fast path is optimized for the valid case, so it's possible it
+		 * returned in the middle of a multibyte sequence, since that wouldn't
+		 * have raised an error. Before checking the remaining bytes, walk
+		 * backwards to find the last byte that could have been the start of a
+		 * valid sequence.
+		 */
+		while (s > start)
+		{
+			s--;
+			len++;
+
+			if (!IS_HIGHBIT_SET(*s) ||
+				IS_UTF8_2B_LEAD(*s) ||
+				IS_UTF8_3B_LEAD(*s) ||
+				IS_UTF8_4B_LEAD(*s))
+				break;
+		}
+	}
+
+	/* check remaining bytes */
 	while (len > 0)
 	{
 		int			l;
diff --git a/src/include/mb/pg_wchar.h b/src/include/mb/pg_wchar.h
index d93ccac263..045bbbcb7e 100644
--- a/src/include/mb/pg_wchar.h
+++ b/src/include/mb/pg_wchar.h
@@ -29,6 +29,13 @@ typedef unsigned int pg_wchar;
  */
 #define MAX_MULTIBYTE_CHAR_LEN	4
 
+/*
+ * UTF-8 macros
+ */
+#define IS_UTF8_2B_LEAD(c) (((c) & 0xe0) == 0xc0)
+#define IS_UTF8_3B_LEAD(c) (((c) & 0xf0) == 0xe0)
+#define IS_UTF8_4B_LEAD(c) (((c) & 0xf8) == 0xf0)
+
 /*
  * various definitions for EUC
  */
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 15ffdd895a..58fd420831 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -904,6 +904,9 @@
 /* Define to 1 to build with BSD Authentication support. (--with-bsd-auth) */
 #undef USE_BSD_AUTH
 
+/* Define to 1 to use the fallback. */
+#undef USE_FALLBACK_UTF8
+
 /* Define to build with ICU support. (--with-icu) */
 #undef USE_ICU
 
@@ -941,6 +944,12 @@
 /* Define to 1 to use Intel SSE 4.2 CRC instructions with a runtime check. */
 #undef USE_SSE42_CRC32C_WITH_RUNTIME_CHECK
 
+/* Define to 1 use Intel SSE 4.2 instructions. */
+#undef USE_SSE42_UTF8
+
+/* Define to 1 to use Intel SSE 4.2 instructions with a runtime check. */
+#undef USE_SSE42_UTF8_WITH_RUNTIME_CHECK
+
 /* Define to build with systemd support. (--with-systemd) */
 #undef USE_SYSTEMD
 
diff --git a/src/include/port/pg_sse42_utils.h b/src/include/port/pg_sse42_utils.h
new file mode 100644
index 0000000000..deafb3e5f8
--- /dev/null
+++ b/src/include/port/pg_sse42_utils.h
@@ -0,0 +1,163 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_sse42_utils.h
+ *	  Convenience functions to wrap SSE 4.2 intrinsics.
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/port/pg_sse42_utils.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_SSE42_UTILS
+#define PG_SSE42_UTILS
+
+#include <nmmintrin.h>
+
+
+/* assign the arguments to the lanes in the register */
+#define vset(...)       _mm_setr_epi8(__VA_ARGS__)
+
+/* return a zeroed register */
+static inline const __m128i
+vzero()
+{
+	return _mm_setzero_si128();
+}
+
+/* perform an unaligned load from memory into a register */
+static inline const __m128i
+vload(const unsigned char *raw_input)
+{
+	return _mm_loadu_si128((const __m128i *) raw_input);
+}
+
+/* return a vector with each 8-bit lane populated with the input scalar */
+static inline __m128i
+splat(char byte)
+{
+	return _mm_set1_epi8(byte);
+}
+
+/* return false if a register is zero, true otherwise */
+static inline bool
+to_bool(const __m128i v)
+{
+	/*
+	 * _mm_testz_si128 returns 1 if the bitwise AND of the two arguments is
+	 * zero. Zero is the only value whose bitwise AND with itself is zero.
+	 */
+	return !_mm_testz_si128(v, v);
+}
+
+/* vector version of IS_HIGHBIT_SET() */
+static inline bool
+is_highbit_set(const __m128i v)
+{
+	return _mm_movemask_epi8(v) != 0;
+}
+
+/* bitwise vector operations */
+
+static inline __m128i
+bitwise_and(const __m128i v1, const __m128i v2)
+{
+	return _mm_and_si128(v1, v2);
+}
+
+static inline __m128i
+bitwise_or(const __m128i v1, const __m128i v2)
+{
+	return _mm_or_si128(v1, v2);
+}
+
+static inline __m128i
+bitwise_xor(const __m128i v1, const __m128i v2)
+{
+	return _mm_xor_si128(v1, v2);
+}
+
+/* perform signed greater-than on all 8-bit lanes */
+static inline __m128i
+greater_than(const __m128i v1, const __m128i v2)
+{
+	return _mm_cmpgt_epi8(v1, v2);
+}
+
+/* set bits in the error vector where bytes in the input are zero */
+static inline void
+check_for_zeros(const __m128i v, __m128i * error)
+{
+	const		__m128i cmp = _mm_cmpeq_epi8(v, vzero());
+
+	*error = bitwise_or(*error, cmp);
+}
+
+/*
+ * Do unsigned subtraction, but instead of wrapping around
+ * on overflow, stop at zero. Useful for emulating unsigned
+ * comparison.
+ */
+static inline __m128i
+saturating_sub(const __m128i v1, const __m128i v2)
+{
+	return _mm_subs_epu8(v1, v2);
+}
+
+/*
+ * Shift right each 8-bit lane
+ *
+ * There is no intrinsic to do this on 8-bit lanes, so shift
+ * right in each 16-bit lane then apply a mask in each 8-bit
+ * lane shifted the same amount.
+ */
+static inline __m128i
+shift_right(const __m128i v, const int n)
+{
+	const		__m128i shift16 = _mm_srli_epi16(v, n);
+	const		__m128i mask = splat(0xFF >> n);
+
+	return bitwise_and(shift16, mask);
+}
+
+/*
+ * Shift entire 'input' register right by N 8-bit lanes, and
+ * replace the first N lanes with the last N lanes from the
+ * 'prev' register. Could be stated in C thusly:
+ *
+ * ((prev << 128) | input) >> (N * 8)
+ *
+ * The third argument to the intrinsic must be a numeric constant, so
+ * we must have separate functions for different shift amounts.
+ */
+static inline __m128i
+prev1(__m128i prev, __m128i input)
+{
+	return _mm_alignr_epi8(input, prev, sizeof(__m128i) - 1);
+}
+
+static inline __m128i
+prev2(__m128i prev, __m128i input)
+{
+	return _mm_alignr_epi8(input, prev, sizeof(__m128i) - 2);
+}
+
+static inline __m128i
+prev3(__m128i prev, __m128i input)
+{
+	return _mm_alignr_epi8(input, prev, sizeof(__m128i) - 3);
+}
+
+/*
+ * For each 8-bit lane in the input, use that value as an index
+ * into the lookup vector as if it were a 16-element byte array.
+ */
+static inline __m128i
+lookup(const __m128i input, const __m128i lookup)
+{
+	return _mm_shuffle_epi8(lookup, input);
+}
+
+#endif							/* PG_SSE42_UTILS */
diff --git a/src/include/port/pg_utf8.h b/src/include/port/pg_utf8.h
new file mode 100644
index 0000000000..76b6ebf3f2
--- /dev/null
+++ b/src/include/port/pg_utf8.h
@@ -0,0 +1,103 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_utf8.h
+ *	  Routines for fast validation of UTF-8 text.
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/port/pg_utf8.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_UTF8_H
+#define PG_UTF8_H
+
+
+#if defined(USE_SSE42_UTF8)
+/* Use SSE 4.2 instructions. */
+#define UTF8_VERIFYSTR_FAST(s, len) \
+	pg_validate_utf8_sse42((s), (len))
+
+extern int	pg_validate_utf8_sse42(const unsigned char *s, int len);
+
+#elif defined(USE_SSE42_UTF8_WITH_RUNTIME_CHECK)
+/* Use SSE 4.2 instructions, but perform a runtime check first. */
+#define UTF8_VERIFYSTR_FAST(s, len) \
+	pg_validate_utf8((s), (len))
+
+extern int	pg_validate_utf8_fallback(const unsigned char *s, int len);
+extern int	(*pg_validate_utf8) (const unsigned char *s, int len);
+extern int	pg_validate_utf8_sse42(const unsigned char *s, int len);
+
+#else
+/* Use a portable implementation */
+#define UTF8_VERIFYSTR_FAST(s, len) \
+	pg_validate_utf8_fallback((s), (len))
+
+extern int	pg_validate_utf8_fallback(const unsigned char *s, int len);
+
+#endif							/* USE_SSE42_UTF8 */
+
+/* The following are visible in all builds. */
+
+/*
+ * Verify a chunk of bytes for valid ASCII including a zero-byte check.
+ * This is here in case non-UTF8 encodings want to use it.
+ * WIP: Is there a better place for it?
+ */
+static inline bool
+is_valid_ascii(const unsigned char *s, int len)
+{
+	uint64		chunk,
+				highbit_cum = UINT64CONST(0),
+				zero_cum = UINT64CONST(0x8080808080808080);
+
+	Assert(len % sizeof(chunk) == 0);
+
+	while (len >= sizeof(chunk))
+	{
+		memcpy(&chunk, s, sizeof(chunk));
+
+		/*
+		 * Capture any zero bytes in this chunk.
+		 *
+		 * First, add 0x7f to each byte. This sets the high bit in each byte,
+		 * unless it was a zero. We will check later that none of the bytes in
+		 * the chunk had the high bit set, in which case the max value each
+		 * byte can have after the addition is 0x7f + 0x7f = 0xfe, and we
+		 * don't need to worry about carrying over to the next byte.
+		 *
+		 * If any resulting high bits are zero, the corresponding high bits in
+		 * the zero accumulator will be cleared.
+		 */
+		zero_cum &= (chunk + UINT64CONST(0x7f7f7f7f7f7f7f7f));
+
+		/* Capture any set bits in this chunk. */
+		highbit_cum |= chunk;
+
+		s += sizeof(chunk);
+		len -= sizeof(chunk);
+	}
+
+	/* Check if any high bits in the high bit accumulator got set. */
+	if (highbit_cum & UINT64CONST(0x8080808080808080))
+		return false;
+
+	/*
+	 * Check if any high bits in the zero accumulator got cleared.
+	 *
+	 * XXX: As noted above, the zero check is only valid if the chunk had no
+	 * high bits set. However, the compiler may perform these two checks in
+	 * any order. That's okay because if any high bits were set, we would
+	 * return false regardless, so invalid results from the zero check don't
+	 * matter.
+	 */
+	if (zero_cum != UINT64CONST(0x8080808080808080))
+		return false;
+
+	return true;
+}
+
+#endif							/* PG_UTF8_H */
diff --git a/src/port/Makefile b/src/port/Makefile
index 52dbf5783f..04838b0ab2 100644
--- a/src/port/Makefile
+++ b/src/port/Makefile
@@ -40,6 +40,7 @@ LIBS += $(PTHREAD_LIBS)
 OBJS = \
 	$(LIBOBJS) \
 	$(PG_CRC32C_OBJS) \
+	$(PG_UTF8_OBJS) \
 	bsearch_arg.o \
 	chklocale.o \
 	erand48.o \
@@ -89,6 +90,11 @@ libpgport.a: $(OBJS)
 thread.o: CFLAGS+=$(PTHREAD_CFLAGS)
 thread_shlib.o: CFLAGS+=$(PTHREAD_CFLAGS)
 
+# all versions of pg_utf8_sse42.o need CFLAGS_SSE42
+pg_utf8_sse42.o: CFLAGS+=$(CFLAGS_SSE42)
+pg_utf8_sse42_shlib.o: CFLAGS+=$(CFLAGS_SSE42)
+pg_utf8_sse42_srv.o: CFLAGS+=$(CFLAGS_SSE42)
+
 # all versions of pg_crc32c_sse42.o need CFLAGS_SSE42
 pg_crc32c_sse42.o: CFLAGS+=$(CFLAGS_SSE42)
 pg_crc32c_sse42_shlib.o: CFLAGS+=$(CFLAGS_SSE42)
diff --git a/src/port/pg_utf8_fallback.c b/src/port/pg_utf8_fallback.c
new file mode 100644
index 0000000000..4291dd516e
--- /dev/null
+++ b/src/port/pg_utf8_fallback.c
@@ -0,0 +1,231 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_utf8_fallback.c
+ *	  Validate UTF-8 using plain C.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/port/pg_utf8_fallback.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "c.h"
+
+#include "port/pg_utf8.h"
+
+
+/*
+ * This determines how much to advance the string pointer each time per loop.
+ * Sixteen seems to give the best balance of performance across different
+ * byte distributions.
+ */
+#define STRIDE_LENGTH 16
+
+
+/*
+ * The fallback UTF-8 validator uses a "shift-based" DFA as described by Per
+ * Vognsen:
+ *
+ * https://gist.github.com/pervognsen/218ea17743e1442e59bb60d29b1aa725
+ *
+ * In a traditional table-driven DFA, the input byte and current state are
+ * used to compute an index into an array of state transitions. Since the
+ * load address is dependent on earlier work, the CPU is not kept busy.
+ *
+ * In a shift-based DFA, the input byte is an index into array of integers
+ * that encode the state transitions. To retrieve the current state, you
+ * simply right-shift the integer by the current state and apply a mask. In
+ * this scheme, loads only depend on the input byte, so there is better
+ * pipelining.
+ *
+ * In the most straigtforward implementation, a shift-based DFA for UTF-8
+ * requires 64-bit integers to encode the transitions, but with an SMT
+ * solver it's possible to find state numbers such that the transitions fit
+ * within 32-bit integers, as Dougall Johnson demonstrated:
+ *
+ * https://gist.github.com/dougallj/166e326de6ad4cf2c94be97a204c025f
+ *
+ * The naming convention for states and transitions was adopted from a
+ * UTF-8 to UTF-16/32 transcoder, which uses a traditional DFA:
+ *
+ * https://github.com/BobSteagall/utf_utils/blob/master/src/utf_utils.cpp
+ *
+ * ILL  ASC  CR1  CR2  CR3  L2A  L3A  L3B  L3C  L4A  L4B  L4C CLASS / STATE
+ * ==========================================================================
+ * err, END, err, err, err, CS1, P3A, CS2, P3B, P4A, CS3, P4B,      | BGN/END
+ * err, err, err, err, err, err, err, err, err, err, err, err,      | ERR
+ *                                                                  |
+ * err, err, END, END, END, err, err, err, err, err, err, err,      | CS1
+ * err, err, CS1, CS1, CS1, err, err, err, err, err, err, err,      | CS2
+ * err, err, CS2, CS2, CS2, err, err, err, err, err, err, err,      | CS3
+ *                                                                  |
+ * err, err, err, err, CS1, err, err, err, err, err, err, err,      | P3A
+ * err, err, CS1, CS1, err, err, err, err, err, err, err, err,      | P3B
+ *                                                                  |
+ * err, err, err, CS2, CS2, err, err, err, err, err, err, err,      | P4A
+ * err, err, CS2, err, err, err, err, err, err, err, err, err,      | P4B
+ */
+
+/* Error */
+#define	ERR  0
+/* Begin */
+#define	BGN 11
+/* Continuation states, expect 1/2/3 continuation bytes */
+#define	CS1 16
+#define	CS2  1
+#define	CS3  5
+/* Leading byte was E0/ED, expect 1 more continuation byte */
+#define	P3A  6
+#define	P3B 20
+/* Leading byte was F0/F4, expect 2 more continuation bytes */
+#define	P4A 25
+#define	P4B 30
+/* Begin and End are the same state */
+#define	END BGN
+
+/* the encoded state transitions for the lookup table */
+
+/* ASCII */
+#define ASC (END << BGN)
+/* 2-byte lead */
+#define L2A (CS1 << BGN)
+/* 3-byte lead */
+#define L3A (P3A << BGN)
+#define L3B (CS2 << BGN)
+#define L3C (P3B << BGN)
+/* 4-byte lead */
+#define L4A (P4A << BGN)
+#define L4B (CS3 << BGN)
+#define L4C (P4B << BGN)
+/* continuation byte */
+#define CR1 (END << CS1) | (CS1 << CS2) | (CS2 << CS3) | (CS1 << P3B) | (CS2 << P4B)
+#define CR2 (END << CS1) | (CS1 << CS2) | (CS2 << CS3) | (CS1 << P3B) | (CS2 << P4A)
+#define CR3 (END << CS1) | (CS1 << CS2) | (CS2 << CS3) | (CS1 << P3A) | (CS2 << P4A)
+/* invalid byte */
+#define ILL ERR
+
+static const uint32 Utf8Transition[256] =
+{
+	/* ASCII */
+
+	ILL, ASC, ASC, ASC, ASC, ASC, ASC, ASC,
+	ASC, ASC, ASC, ASC, ASC, ASC, ASC, ASC,
+	ASC, ASC, ASC, ASC, ASC, ASC, ASC, ASC,
+	ASC, ASC, ASC, ASC, ASC, ASC, ASC, ASC,
+
+	ASC, ASC, ASC, ASC, ASC, ASC, ASC, ASC,
+	ASC, ASC, ASC, ASC, ASC, ASC, ASC, ASC,
+	ASC, ASC, ASC, ASC, ASC, ASC, ASC, ASC,
+	ASC, ASC, ASC, ASC, ASC, ASC, ASC, ASC,
+
+	ASC, ASC, ASC, ASC, ASC, ASC, ASC, ASC,
+	ASC, ASC, ASC, ASC, ASC, ASC, ASC, ASC,
+	ASC, ASC, ASC, ASC, ASC, ASC, ASC, ASC,
+	ASC, ASC, ASC, ASC, ASC, ASC, ASC, ASC,
+
+	ASC, ASC, ASC, ASC, ASC, ASC, ASC, ASC,
+	ASC, ASC, ASC, ASC, ASC, ASC, ASC, ASC,
+	ASC, ASC, ASC, ASC, ASC, ASC, ASC, ASC,
+	ASC, ASC, ASC, ASC, ASC, ASC, ASC, ASC,
+
+	/* continuation bytes */
+
+	/* 80..8F */
+	CR1, CR1, CR1, CR1, CR1, CR1, CR1, CR1,
+	CR1, CR1, CR1, CR1, CR1, CR1, CR1, CR1,
+
+	/* 90..9F */
+	CR2, CR2, CR2, CR2, CR2, CR2, CR2, CR2,
+	CR2, CR2, CR2, CR2, CR2, CR2, CR2, CR2,
+
+	/* A0..BF */
+	CR3, CR3, CR3, CR3, CR3, CR3, CR3, CR3,
+	CR3, CR3, CR3, CR3, CR3, CR3, CR3, CR3,
+	CR3, CR3, CR3, CR3, CR3, CR3, CR3, CR3,
+	CR3, CR3, CR3, CR3, CR3, CR3, CR3, CR3,
+
+	/* leading bytes */
+
+	/* C0..DF */
+	ILL, ILL, L2A, L2A, L2A, L2A, L2A, L2A,
+	L2A, L2A, L2A, L2A, L2A, L2A, L2A, L2A,
+	L2A, L2A, L2A, L2A, L2A, L2A, L2A, L2A,
+	L2A, L2A, L2A, L2A, L2A, L2A, L2A, L2A,
+
+	/* E0..EF */
+	L3A, L3B, L3B, L3B, L3B, L3B, L3B, L3B,
+	L3B, L3B, L3B, L3B, L3B, L3C, L3B, L3B,
+
+	/* F0..FF */
+	L4A, L4B, L4B, L4B, L4C, ILL, ILL, ILL,
+	ILL, ILL, ILL, ILL, ILL, ILL, ILL, ILL
+};
+
+static inline void
+utf8_advance(const unsigned char *s, uint32 *state, int len)
+{
+	/* Note: We deliberately don't check the state within the loop. */
+	while (len > 0)
+	{
+		/*
+		 * It's important that the mask value is 31: In most instruction sets,
+		 * a shift by a 32-bit operand is understood to be a shift by its mod
+		 * 32, so the compiler should elide the mask operation.
+		 */
+		*state = Utf8Transition[*s++] >> (*state & 31);
+		len--;
+	}
+
+	*state &= 31;
+}
+
+/*
+ * Returns zero on error, or the string length if no errors were detected.
+ *
+ * In the error case, the caller must start over from the beginning and verify
+ * one byte at a time.
+ *
+ * In the non-error case, it's still possible we ended in the middle of an
+ * incomplete multibyte sequence, so the caller is responsible for adjusting
+ * the returned result to make sure it represents the end of the last valid
+ * byte sequence.
+ *
+ * See also the comment in common/wchar.c under "multibyte sequence
+ * validators".
+ */
+int
+pg_validate_utf8_fallback(const unsigned char *s, int len)
+{
+	const int	orig_len = len;
+	uint32		state = BGN;
+
+	while (len >= STRIDE_LENGTH)
+	{
+		/*
+		 * If the chunk is all ASCII, we can skip the full UTF-8 check, but we
+		 * must first check for a non-END state, which means the previous
+		 * chunk ended in the middle of a multibyte sequence.
+		 */
+		if (state != END || !is_valid_ascii(s, STRIDE_LENGTH))
+			utf8_advance(s, &state, STRIDE_LENGTH);
+
+		s += STRIDE_LENGTH;
+		len -= STRIDE_LENGTH;
+	}
+
+	/* check remaining bytes */
+	utf8_advance(s, &state, len);
+
+	/*
+	 * If we saw an error at any time, the final state will be error, in which
+	 * case we let the caller handle it. We treat all other states as success.
+	 */
+	if (state == ERR)
+		return 0;
+	else
+		return orig_len;
+}
diff --git a/src/port/pg_utf8_sse42.c b/src/port/pg_utf8_sse42.c
new file mode 100644
index 0000000000..8dd86911de
--- /dev/null
+++ b/src/port/pg_utf8_sse42.c
@@ -0,0 +1,349 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_utf8_sse42.c
+ *	  Validate UTF-8 using SSE 4.2 instructions.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/port/pg_utf8_sse42.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "c.h"
+
+#include "port/pg_sse42_utils.h"
+#include "port/pg_utf8.h"
+
+
+/*
+ * This module is based on the paper "Validating UTF-8 In Less Than One
+ * Instruction Per Byte" by John Keiser and Daniel Lemire:
+ *
+ * https://arxiv.org/pdf/2010.03090.pdf
+ *
+ * The authors provide an implementation of this algorithm in the simdjson
+ * library (Apache 2.0 license):
+ *
+ * https://github.com/simdjson/simdjson
+ *
+ * The PG code was written from scratch, but with some naming conventions
+ * adapted from the Westmere implementation of simdjson. The constants and
+ * lookup tables were taken directly from simdjson with some cosmetic
+ * rearrangements.
+ *
+ * The core of the lookup algorithm is a two-part process:
+ *
+ * 1. Classify 2-byte sequences. All 2-byte errors can be found by looking at
+ * the first three nibbles of each overlapping 2-byte sequence, using three
+ * separate lookup tables. The interesting bytes are either definite errors
+ * or two continuation bytes in a row. The latter may be valid depending on
+ * what came before.
+ *
+ * 2. Find starts of possible 3- and 4-byte sequences.
+ *
+ * Combining the above results allows us to verify any UTF-8 sequence.
+ */
+
+
+#define MAX_CONT 0xBF
+#define MAX_2B_LEAD 0xDF
+#define MAX_3B_LEAD 0xEF
+
+/* lookup tables for classifying two-byte sequences */
+
+/*
+ * 11______ 0_______
+ * 11______ 11______
+ */
+#define TOO_SHORT		(1 << 0)
+
+/* 0_______ 10______ */
+#define TOO_LONG		(1 << 1)
+
+/* 1100000_ 10______ */
+#define OVERLONG_2		(1 << 2)
+
+/* 11100000 100_____ */
+#define OVERLONG_3		(1 << 3)
+
+/* The following two symbols intentionally share the same value. */
+
+/* 11110000 1000____ */
+#define OVERLONG_4		(1 << 4)
+
+/*
+ * 11110101 1000____
+ * 1111011_ 1000____
+ * 11111___ 1000____
+ */
+#define TOO_LARGE_1000	(1 << 4)
+
+/*
+ * 11110100 1001____
+ * 11110100 101_____
+ * 11110101 1001____
+ * 11110101 101_____
+ * 1111011_ 1001____
+ * 1111011_ 101_____
+ * 11111___ 1001____
+ * 11111___ 101_____
+ */
+#define TOO_LARGE		(1 << 5)
+
+/* 11101101 101_____ */
+#define SURROGATE		(1 << 6)
+
+/*
+ * 10______ 10______
+ *
+ * The cast here is to silence warnings about implicit conversion
+ * from 'int' to 'char'. It's fine that this is a negative value,
+ * because we only care about the pattern of bits.
+ */
+#define TWO_CONTS ((char) (1 << 7))
+
+/* These all have ____ in byte 1 */
+#define CARRY (TOO_SHORT | TOO_LONG | TWO_CONTS)
+
+/*
+ * table for categorizing bits in the high nibble of
+ * the first byte of a 2-byte sequence
+ */
+#define BYTE_1_HIGH_TABLE \
+	/* 0_______ ________ <ASCII in byte 1> */ \
+	TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, \
+	TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, \
+	/* 10______ ________ <continuation in byte 1> */ \
+	TWO_CONTS, TWO_CONTS, TWO_CONTS, TWO_CONTS, \
+	/* 1100____ ________ <two byte lead in byte 1> */ \
+	TOO_SHORT | OVERLONG_2, \
+	/* 1101____ ________ <two byte lead in byte 1> */ \
+	TOO_SHORT, \
+	/* 1110____ ________ <three byte lead in byte 1> */ \
+	TOO_SHORT | OVERLONG_3 | SURROGATE, \
+	/* 1111____ ________ <four+ byte lead in byte 1> */ \
+	TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4
+
+/*
+ * table for categorizing bits in the low nibble of
+ * the first byte of a 2-byte sequence
+ */
+#define BYTE_1_LOW_TABLE \
+	/* ____0000 ________ */ \
+	CARRY | OVERLONG_2 | OVERLONG_3 | OVERLONG_4, \
+	/* ____0001 ________ */ \
+	CARRY | OVERLONG_2, \
+	/* ____001_ ________ */ \
+	CARRY, \
+	CARRY, \
+	/* ____0100 ________ */ \
+	CARRY | TOO_LARGE, \
+	/* ____0101 ________ */ \
+	CARRY | TOO_LARGE | TOO_LARGE_1000, \
+	/* ____011_ ________ */ \
+	CARRY | TOO_LARGE | TOO_LARGE_1000, \
+	CARRY | TOO_LARGE | TOO_LARGE_1000, \
+	/* ____1___ ________ */ \
+	CARRY | TOO_LARGE | TOO_LARGE_1000, \
+	CARRY | TOO_LARGE | TOO_LARGE_1000, \
+	CARRY | TOO_LARGE | TOO_LARGE_1000, \
+	CARRY | TOO_LARGE | TOO_LARGE_1000, \
+	CARRY | TOO_LARGE | TOO_LARGE_1000, \
+	/* ____1101 ________ */ \
+	CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE, \
+	/* ____111_ ________ */ \
+	CARRY | TOO_LARGE | TOO_LARGE_1000, \
+	CARRY | TOO_LARGE | TOO_LARGE_1000
+
+/*
+ * table for categorizing bits in the high nibble of
+ * the second byte of a 2-byte sequence
+ */
+#define BYTE_2_HIGH_TABLE \
+	/* ________ 0_______ <ASCII in byte 2> */ \
+	TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, \
+	TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, \
+	/* ________ 1000____ */ \
+	TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 | OVERLONG_4, \
+	/* ________ 1001____ */ \
+	TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE, \
+	/* ________ 101_____ */ \
+	TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE, \
+	TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE, \
+	/* ________ 11______ */ \
+	TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT
+
+
+/*
+ * Return a vector with lanes non-zero where we have either errors, or
+ * two or more continuations in a row.
+ */
+static inline __m128i
+check_special_cases(const __m128i prev, const __m128i input)
+{
+	const		__m128i byte_1_hi_table = vset(BYTE_1_HIGH_TABLE);
+	const		__m128i byte_1_lo_table = vset(BYTE_1_LOW_TABLE);
+	const		__m128i byte_2_hi_table = vset(BYTE_2_HIGH_TABLE);
+
+	/*
+	 * To classify the first byte in each chunk we need to have the last byte
+	 * from the previous chunk.
+	 */
+	const		__m128i input_shift1 = prev1(prev, input);
+
+	/* put the relevant nibbles into their own bytes in their own registers */
+	const		__m128i byte_1_hi = shift_right(input_shift1, 4);
+	const		__m128i byte_1_lo = bitwise_and(input_shift1, splat(0x0F));
+	const		__m128i byte_2_hi = shift_right(input, 4);
+
+	/* lookup the possible errors for each set of nibbles */
+	const		__m128i lookup_1_hi = lookup(byte_1_hi, byte_1_hi_table);
+	const		__m128i lookup_1_lo = lookup(byte_1_lo, byte_1_lo_table);
+	const		__m128i lookup_2_hi = lookup(byte_2_hi, byte_2_hi_table);
+
+	/*
+	 * AND all the lookups together. At this point, non-zero lanes in the
+	 * returned vector represent:
+	 *
+	 * 1. invalid 2-byte sequences
+	 *
+	 * 2. the second continuation byte of a 3- or 4-byte character
+	 *
+	 * 3. the third continuation byte of a 4-byte character
+	 */
+	const		__m128i temp = bitwise_and(lookup_1_hi, lookup_1_lo);
+
+	return bitwise_and(temp, lookup_2_hi);
+}
+
+/*
+ * Return a vector with lanes set to TWO_CONTS where we expect to find two
+ * continuations in a row, namely after 3- and 4-byte leads.
+ */
+static inline __m128i
+check_multibyte_lengths(const __m128i prev, const __m128i input)
+{
+	/*
+	 * Populate registers that contain the input shifted right by 2 and 3
+	 * bytes, filling in the left lanes with the previous input.
+	 */
+	const		__m128i input_shift2 = prev2(prev, input);
+	const		__m128i input_shift3 = prev3(prev, input);
+
+	/*
+	 * Constants for comparison. Any 3-byte lead is greater than
+	 * MAX_2B_LEAD, etc.
+	 */
+	const		__m128i max_lead2 = splat(MAX_2B_LEAD);
+	const		__m128i max_lead3 = splat(MAX_3B_LEAD);
+
+	/*
+	 * Look in the shifted registers for 3- or 4-byte leads. There is no
+	 * unsigned comparison, so we use saturating subtraction followed by
+	 * signed comparison with zero. Any non-zero bytes in the result represent
+	 * valid leads.
+	 */
+	const		__m128i is_third_byte = saturating_sub(input_shift2, max_lead2);
+	const		__m128i is_fourth_byte = saturating_sub(input_shift3, max_lead3);
+
+	/* OR them together for easier comparison */
+	const		__m128i temp = bitwise_or(is_third_byte, is_fourth_byte);
+
+	/*
+	 * Set all bits in each 8-bit lane if the result is greater than zero.
+	 * Signed arithmetic is okay because the values are small.
+	 */
+	const		__m128i must23 = greater_than(temp, vzero());
+
+	/*
+	 * We want to compare with the result of check_special_cases() so apply a
+	 * mask to return only the set bits corresponding to the "two
+	 * continuations" case.
+	 */
+	return bitwise_and(must23, splat(TWO_CONTS));
+}
+
+/* set bits in the error vector where we find invalid UTF-8 input */
+static inline void
+check_utf8_bytes(const __m128i prev, const __m128i input, __m128i * error)
+{
+	const		__m128i special_cases = check_special_cases(prev, input);
+	const		__m128i set_two_conts = check_multibyte_lengths(prev, input);
+
+	/* If the two cases are identical, this will be zero. */
+	const		__m128i result = bitwise_xor(special_cases, set_two_conts);
+
+	*error = bitwise_or(*error, result);
+}
+
+/* return non-zero if the input terminates with an incomplete code point */
+static inline __m128i
+is_incomplete(const __m128i v)
+{
+	const		__m128i max_array =
+	vset(0xFF, 0xFF, 0xFF, 0xFF,
+		 0xFF, 0xFF, 0xFF, 0xFF,
+		 0xFF, 0xFF, 0xFF, 0xFF,
+		 0xFF, MAX_3B_LEAD, MAX_2B_LEAD, MAX_CONT);
+
+	return saturating_sub(v, max_array);
+}
+
+/*
+ * Returns zero on error, or the number of bytes processed if no errors were
+ * detected.
+ *
+ * In the error case, the caller must start over at the beginning and verify
+ * one byte at a time.
+ *
+ * In the non-error case, it's still possible we ended in the middle of an
+ * incomplete multibyte sequence, so the caller is responsible for adjusting
+ * the returned result to make sure it represents the end of the last valid
+ * byte sequence.
+ *
+ * See also the comment in common/wchar.c under "multibyte sequence
+ * validators".
+ */
+int
+pg_validate_utf8_sse42(const unsigned char *s, int len)
+{
+	const unsigned char *start = s;
+	__m128i		error_cum = vzero();
+	__m128i		prev = vzero();
+	__m128i		prev_incomplete = vzero();
+	__m128i		input;
+
+	while (len >= sizeof(input))
+	{
+		input = vload(s);
+		check_for_zeros(input, &error_cum);
+
+		/*
+		 * If the chunk is all ASCII, we can skip the full UTF-8 check, but we
+		 * must still check the previous chunk for incomplete multibyte
+		 * sequences at the end. We only update prev_incomplete if the chunk
+		 * contains non-ASCII.
+		 */
+		if (is_highbit_set(input))
+		{
+			check_utf8_bytes(prev, input, &error_cum);
+			prev_incomplete = is_incomplete(input);
+		}
+		else
+			error_cum = bitwise_or(error_cum, prev_incomplete);
+
+		prev = input;
+		s += sizeof(input);
+		len -= sizeof(input);
+	}
+
+	/* If we saw an error during the loop, let the caller handle it. */
+	if (to_bool(error_cum))
+		return 0;
+	else
+		return s - start;
+}
diff --git a/src/port/pg_utf8_sse42_choose.c b/src/port/pg_utf8_sse42_choose.c
new file mode 100644
index 0000000000..973fe69225
--- /dev/null
+++ b/src/port/pg_utf8_sse42_choose.c
@@ -0,0 +1,68 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_utf8_sse42_choose.c
+ *	  Choose between SSE 4.2 and fallback implementation.
+ *
+ * On first call, checks if the CPU we're running on supports SSE 4.2.
+ * If it does, use SSE instructions for UTF-8 validation. Otherwise,
+ * fall back to the pure C implementation.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/port/pg_utf8_sse42_choose.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "c.h"
+
+#ifdef HAVE__GET_CPUID
+#include <cpuid.h>
+#endif
+
+#ifdef HAVE__CPUID
+#include <intrin.h>
+#endif
+
+#include "port/pg_utf8.h"
+
+static bool
+pg_utf8_sse42_available(void)
+{
+	unsigned int exx[4] = {0, 0, 0, 0};
+
+#if defined(HAVE__GET_CPUID)
+	__get_cpuid(1, &exx[0], &exx[1], &exx[2], &exx[3]);
+#elif defined(HAVE__CPUID)
+	__cpuid(exx, 1);
+#else
+
+	/*
+	 * XXX The equivalent check for CRC throws an error here because it
+	 * detects CPUID presence at configure time. This is to avoid indirecting
+	 * through a function pointer, but that's not important for UTF-8.
+	 */
+	return false;
+#endif							/* HAVE__GET_CPUID */
+	return (exx[2] & (1 << 20)) != 0;	/* SSE 4.2 */
+}
+
+/*
+ * This gets called on the first call. It replaces the function pointer
+ * so that subsequent calls are routed directly to the chosen implementation.
+ */
+static int
+pg_validate_utf8_choose(const unsigned char *s, int len)
+{
+	if (pg_utf8_sse42_available())
+		pg_validate_utf8 = pg_validate_utf8_sse42;
+	else
+		pg_validate_utf8 = pg_validate_utf8_fallback;
+
+	return pg_validate_utf8(s, len);
+}
+
+int			(*pg_validate_utf8) (const unsigned char *s, int len) = pg_validate_utf8_choose;
diff --git a/src/test/regress/expected/conversion.out b/src/test/regress/expected/conversion.out
index 04fdcba496..201876a495 100644
--- a/src/test/regress/expected/conversion.out
+++ b/src/test/regress/expected/conversion.out
@@ -72,6 +72,176 @@ $$;
 --
 -- UTF-8
 --
+-- The description column must be unique.
+CREATE TABLE utf8_verification_inputs (inbytes bytea, description text PRIMARY KEY);
+insert into utf8_verification_inputs  values
+  ('\x66006f',	'NUL byte'),
+  ('\xaf',		'bare continuation'),
+  ('\xc5',		'missing second byte in 2-byte char'),
+  ('\xc080',	'smallest 2-byte overlong'),
+  ('\xc1bf',	'largest 2-byte overlong'),
+  ('\xc280',	'next 2-byte after overlongs'),
+  ('\xdfbf',	'largest 2-byte'),
+  ('\xe9af',	'missing third byte in 3-byte char'),
+  ('\xe08080',	'smallest 3-byte overlong'),
+  ('\xe09fbf',	'largest 3-byte overlong'),
+  ('\xe0a080',	'next 3-byte after overlong'),
+  ('\xed9fbf',	'last before surrogates'),
+  ('\xeda080',	'smallest surrogate'),
+  ('\xedbfbf',	'largest surrogate'),
+  ('\xee8080',	'next after surrogates'),
+  ('\xefbfbf',	'largest 3-byte'),
+  ('\xf1afbf',	'missing fourth byte in 4-byte char'),
+  ('\xf0808080',	'smallest 4-byte overlong'),
+  ('\xf08fbfbf',	'largest 4-byte overlong'),
+  ('\xf0908080',	'next 4-byte after overlong'),
+  ('\xf48fbfbf',	'largest 4-byte'),
+  ('\xf4908080',	'smallest too large'),
+  ('\xfa9a9a8a8a',	'5-byte');
+-- Test UTF-8 verification
+select description, (test_conv(inbytes, 'utf8', 'utf8')).* from utf8_verification_inputs;
+            description             |   result   |   errorat    |                             error                              
+------------------------------------+------------+--------------+----------------------------------------------------------------
+ NUL byte                           | \x66       | \x006f       | invalid byte sequence for encoding "UTF8": 0x00
+ bare continuation                  | \x         | \xaf         | invalid byte sequence for encoding "UTF8": 0xaf
+ missing second byte in 2-byte char | \x         | \xc5         | invalid byte sequence for encoding "UTF8": 0xc5
+ smallest 2-byte overlong           | \x         | \xc080       | invalid byte sequence for encoding "UTF8": 0xc0 0x80
+ largest 2-byte overlong            | \x         | \xc1bf       | invalid byte sequence for encoding "UTF8": 0xc1 0xbf
+ next 2-byte after overlongs        | \xc280     |              | 
+ largest 2-byte                     | \xdfbf     |              | 
+ missing third byte in 3-byte char  | \x         | \xe9af       | invalid byte sequence for encoding "UTF8": 0xe9 0xaf
+ smallest 3-byte overlong           | \x         | \xe08080     | invalid byte sequence for encoding "UTF8": 0xe0 0x80 0x80
+ largest 3-byte overlong            | \x         | \xe09fbf     | invalid byte sequence for encoding "UTF8": 0xe0 0x9f 0xbf
+ next 3-byte after overlong         | \xe0a080   |              | 
+ last before surrogates             | \xed9fbf   |              | 
+ smallest surrogate                 | \x         | \xeda080     | invalid byte sequence for encoding "UTF8": 0xed 0xa0 0x80
+ largest surrogate                  | \x         | \xedbfbf     | invalid byte sequence for encoding "UTF8": 0xed 0xbf 0xbf
+ next after surrogates              | \xee8080   |              | 
+ largest 3-byte                     | \xefbfbf   |              | 
+ missing fourth byte in 4-byte char | \x         | \xf1afbf     | invalid byte sequence for encoding "UTF8": 0xf1 0xaf 0xbf
+ smallest 4-byte overlong           | \x         | \xf0808080   | invalid byte sequence for encoding "UTF8": 0xf0 0x80 0x80 0x80
+ largest 4-byte overlong            | \x         | \xf08fbfbf   | invalid byte sequence for encoding "UTF8": 0xf0 0x8f 0xbf 0xbf
+ next 4-byte after overlong         | \xf0908080 |              | 
+ largest 4-byte                     | \xf48fbfbf |              | 
+ smallest too large                 | \x         | \xf4908080   | invalid byte sequence for encoding "UTF8": 0xf4 0x90 0x80 0x80
+ 5-byte                             | \x         | \xfa9a9a8a8a | invalid byte sequence for encoding "UTF8": 0xfa
+(23 rows)
+
+-- Test UTF-8 verification with ASCII padding appended to provide
+-- coverage for algorithms that work on multiple bytes at a time.
+-- The error message for a sequence starting with a 4-byte lead
+-- will contain all 4 bytes if they are present, so various
+-- expressions below add 3 ASCII bytes to the end to ensure
+-- consistent error messages.
+-- The number 64 below needs to be the width of the widest SIMD
+-- register we could possibly support in the forseeable future.
+-- Test multibyte verification in fast path
+with test_bytes as (
+  select
+    inbytes,
+    description,
+    (test_conv(inbytes || repeat('.', 3)::bytea, 'utf8', 'utf8')).error
+  from utf8_verification_inputs
+), test_padded as (
+  select
+    description,
+    (test_conv(inbytes || repeat('.', 64)::bytea, 'utf8', 'utf8')).error
+  from test_bytes
+)
+select
+  description,
+  b.error as orig_error,
+  p.error as error_after_padding
+from test_padded p
+join test_bytes b
+using (description)
+where p.error is distinct from b.error
+order by description;
+ description | orig_error | error_after_padding 
+-------------+------------+---------------------
+(0 rows)
+
+-- Test ASCII verification in fast path where incomplete
+-- UTF-8 sequences fall at the end of the preceding chunk.
+with test_bytes as (
+  select
+    inbytes,
+    description,
+    (test_conv(inbytes || repeat('.', 3)::bytea, 'utf8', 'utf8')).error
+  from utf8_verification_inputs
+), test_padded as (
+  select
+    description,
+    (test_conv(repeat('.', 64 - length(inbytes))::bytea || inbytes || repeat('.', 64)::bytea, 'utf8', 'utf8')).error
+  from test_bytes
+)
+select
+  description,
+  b.error as orig_error,
+  p.error as error_after_padding
+from test_padded p
+join test_bytes b
+using (description)
+where p.error is distinct from b.error
+order by description;
+ description | orig_error | error_after_padding 
+-------------+------------+---------------------
+(0 rows)
+
+-- Test cases where UTF-8 sequences within short text
+-- come after the fast path returns.
+with test_bytes as (
+  select
+    inbytes,
+    description,
+    (test_conv(inbytes || repeat('.', 3)::bytea, 'utf8', 'utf8')).error
+  from utf8_verification_inputs
+), test_padded as (
+  select
+    description,
+    (test_conv(repeat('.', 64)::bytea || inbytes || repeat('.', 3)::bytea, 'utf8', 'utf8')).error
+  from test_bytes
+)
+select
+  description,
+  b.error as orig_error,
+  p.error as error_after_padding
+from test_padded p
+join test_bytes b
+using (description)
+where p.error is distinct from b.error
+order by description;
+ description | orig_error | error_after_padding 
+-------------+------------+---------------------
+(0 rows)
+
+-- Test cases where incomplete UTF-8 sequences fall at the
+-- end of the part checked by the fast path.
+with test_bytes as (
+  select
+    inbytes,
+    description,
+    (test_conv(inbytes || repeat('.', 3)::bytea, 'utf8', 'utf8')).error
+  from utf8_verification_inputs
+), test_padded as (
+  select
+    description,
+    (test_conv(repeat('.', 64 - length(inbytes))::bytea || inbytes || repeat('.', 3)::bytea, 'utf8', 'utf8')).error
+  from test_bytes
+)
+select
+  description,
+  b.error as orig_error,
+  p.error as error_after_padding
+from test_padded p
+join test_bytes b
+using (description)
+where p.error is distinct from b.error
+order by description;
+ description | orig_error | error_after_padding 
+-------------+------------+---------------------
+(0 rows)
+
 CREATE TABLE utf8_inputs (inbytes bytea, description text);
 insert into utf8_inputs  values
   ('\x666f6f',		'valid, pure ASCII'),
diff --git a/src/test/regress/sql/conversion.sql b/src/test/regress/sql/conversion.sql
index 8358682432..6510a88b1b 100644
--- a/src/test/regress/sql/conversion.sql
+++ b/src/test/regress/sql/conversion.sql
@@ -74,6 +74,140 @@ $$;
 --
 -- UTF-8
 --
+-- The description column must be unique.
+CREATE TABLE utf8_verification_inputs (inbytes bytea, description text PRIMARY KEY);
+insert into utf8_verification_inputs  values
+  ('\x66006f',	'NUL byte'),
+  ('\xaf',		'bare continuation'),
+  ('\xc5',		'missing second byte in 2-byte char'),
+  ('\xc080',	'smallest 2-byte overlong'),
+  ('\xc1bf',	'largest 2-byte overlong'),
+  ('\xc280',	'next 2-byte after overlongs'),
+  ('\xdfbf',	'largest 2-byte'),
+  ('\xe9af',	'missing third byte in 3-byte char'),
+  ('\xe08080',	'smallest 3-byte overlong'),
+  ('\xe09fbf',	'largest 3-byte overlong'),
+  ('\xe0a080',	'next 3-byte after overlong'),
+  ('\xed9fbf',	'last before surrogates'),
+  ('\xeda080',	'smallest surrogate'),
+  ('\xedbfbf',	'largest surrogate'),
+  ('\xee8080',	'next after surrogates'),
+  ('\xefbfbf',	'largest 3-byte'),
+  ('\xf1afbf',	'missing fourth byte in 4-byte char'),
+  ('\xf0808080',	'smallest 4-byte overlong'),
+  ('\xf08fbfbf',	'largest 4-byte overlong'),
+  ('\xf0908080',	'next 4-byte after overlong'),
+  ('\xf48fbfbf',	'largest 4-byte'),
+  ('\xf4908080',	'smallest too large'),
+  ('\xfa9a9a8a8a',	'5-byte');
+
+-- Test UTF-8 verification
+select description, (test_conv(inbytes, 'utf8', 'utf8')).* from utf8_verification_inputs;
+
+-- Test UTF-8 verification with ASCII padding appended to provide
+-- coverage for algorithms that work on multiple bytes at a time.
+-- The error message for a sequence starting with a 4-byte lead
+-- will contain all 4 bytes if they are present, so various
+-- expressions below add 3 ASCII bytes to the end to ensure
+-- consistent error messages.
+-- The number 64 below needs to be the width of the widest SIMD
+-- register we could possibly support in the forseeable future.
+
+-- Test multibyte verification in fast path
+with test_bytes as (
+  select
+    inbytes,
+    description,
+    (test_conv(inbytes || repeat('.', 3)::bytea, 'utf8', 'utf8')).error
+  from utf8_verification_inputs
+), test_padded as (
+  select
+    description,
+    (test_conv(inbytes || repeat('.', 64)::bytea, 'utf8', 'utf8')).error
+  from test_bytes
+)
+select
+  description,
+  b.error as orig_error,
+  p.error as error_after_padding
+from test_padded p
+join test_bytes b
+using (description)
+where p.error is distinct from b.error
+order by description;
+
+-- Test ASCII verification in fast path where incomplete
+-- UTF-8 sequences fall at the end of the preceding chunk.
+with test_bytes as (
+  select
+    inbytes,
+    description,
+    (test_conv(inbytes || repeat('.', 3)::bytea, 'utf8', 'utf8')).error
+  from utf8_verification_inputs
+), test_padded as (
+  select
+    description,
+    (test_conv(repeat('.', 64 - length(inbytes))::bytea || inbytes || repeat('.', 64)::bytea, 'utf8', 'utf8')).error
+  from test_bytes
+)
+select
+  description,
+  b.error as orig_error,
+  p.error as error_after_padding
+from test_padded p
+join test_bytes b
+using (description)
+where p.error is distinct from b.error
+order by description;
+
+-- Test cases where UTF-8 sequences within short text
+-- come after the fast path returns.
+with test_bytes as (
+  select
+    inbytes,
+    description,
+    (test_conv(inbytes || repeat('.', 3)::bytea, 'utf8', 'utf8')).error
+  from utf8_verification_inputs
+), test_padded as (
+  select
+    description,
+    (test_conv(repeat('.', 64)::bytea || inbytes || repeat('.', 3)::bytea, 'utf8', 'utf8')).error
+  from test_bytes
+)
+select
+  description,
+  b.error as orig_error,
+  p.error as error_after_padding
+from test_padded p
+join test_bytes b
+using (description)
+where p.error is distinct from b.error
+order by description;
+
+-- Test cases where incomplete UTF-8 sequences fall at the
+-- end of the part checked by the fast path.
+with test_bytes as (
+  select
+    inbytes,
+    description,
+    (test_conv(inbytes || repeat('.', 3)::bytea, 'utf8', 'utf8')).error
+  from utf8_verification_inputs
+), test_padded as (
+  select
+    description,
+    (test_conv(repeat('.', 64 - length(inbytes))::bytea || inbytes || repeat('.', 3)::bytea, 'utf8', 'utf8')).error
+  from test_bytes
+)
+select
+  description,
+  b.error as orig_error,
+  p.error as error_after_padding
+from test_padded p
+join test_bytes b
+using (description)
+where p.error is distinct from b.error
+order by description;
+
 CREATE TABLE utf8_inputs (inbytes bytea, description text);
 insert into utf8_inputs  values
   ('\x666f6f',		'valid, pure ASCII'),
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 3cb46832ab..638f784eb5 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -116,10 +116,14 @@ sub mkvcbuild
 		push(@pgportfiles, 'pg_crc32c_sse42_choose.c');
 		push(@pgportfiles, 'pg_crc32c_sse42.c');
 		push(@pgportfiles, 'pg_crc32c_sb8.c');
+		push(@pgportfiles, 'pg_utf8_sse42_choose.c');
+		push(@pgportfiles, 'pg_utf8_sse42.c');
+		push(@pgportfiles, 'pg_utf8_fallback.c');
 	}
 	else
 	{
 		push(@pgportfiles, 'pg_crc32c_sb8.c');
+		push(@pgportfiles, 'pg_utf8_fallback.c');
 	}
 
 	our @pgcommonallfiles = qw(
diff --git a/src/tools/msvc/Solution.pm b/src/tools/msvc/Solution.pm
index 165a93987a..ce04643151 100644
--- a/src/tools/msvc/Solution.pm
+++ b/src/tools/msvc/Solution.pm
@@ -491,6 +491,7 @@ sub GenerateFiles
 		USE_ASSERT_CHECKING => $self->{options}->{asserts} ? 1 : undef,
 		USE_BONJOUR         => undef,
 		USE_BSD_AUTH        => undef,
+		USE_FALLBACK_UTF8          => undef,
 		USE_ICU => $self->{options}->{icu} ? 1 : undef,
 		USE_LIBXML                 => undef,
 		USE_LIBXSLT                => undef,
@@ -503,6 +504,8 @@ sub GenerateFiles
 		USE_SLICING_BY_8_CRC32C    => undef,
 		USE_SSE42_CRC32C           => undef,
 		USE_SSE42_CRC32C_WITH_RUNTIME_CHECK => 1,
+		USE_SSE42_UTF8             => undef,
+		USE_SSE42_UTF8_WITH_RUNTIME_CHECK => 1,
 		USE_SYSTEMD                         => undef,
 		USE_SYSV_SEMAPHORES                 => undef,
 		USE_SYSV_SHARED_MEMORY              => undef,
-- 
2.31.1

Re: speed up verifying UTF-8

Reply via email to