Re: [POC] verifying UTF-8 using SIMD instructions

John Naylor Thu, 22 Jul 2021 04:39:21 -0700

On Wed, Jul 21, 2021 at 8:08 PM Thomas Munro <thomas.mu...@gmail.com> wrote:
>
> On Thu, Jul 22, 2021 at 6:16 AM John Naylor


> One question is whether this "one size fits all" approach will be
> extensible to wider SIMD.

Sure, it'll just take a little more work and complexity. For one, 16-byte
SIMD can operate on 32-byte chunks with a bit of repetition:

-       __m128i         input;
+       __m128i         input1;
+       __m128i         input2;

-#define SIMD_STRIDE_LENGTH (sizeof(__m128i))
+#define SIMD_STRIDE_LENGTH 32

        while (len >= SIMD_STRIDE_LENGTH)
        {
-               input = vload(s);
+               input1 = vload(s);
+               input2 = vload(s + sizeof(input1));

-               check_for_zeros(input, &error);
+               check_for_zeros(input1, &error);
+               check_for_zeros(input2, &error);

                /*
                 * If the chunk is all ASCII, we can skip the full UTF-8
check, but we
@@ -460,17 +463,18 @@ pg_validate_utf8_sse42(const unsigned char *s, int
len)
                 * sequences at the end. We only update prev_incomplete if
the chunk
                 * contains non-ASCII, since the error is cumulative.
                 */
-               if (is_highbit_set(input))
+               if (is_highbit_set(bitwise_or(input1, input2)))
                {
-                       check_utf8_bytes(prev, input, &error);
-                       prev_incomplete = is_incomplete(input);
+                       check_utf8_bytes(prev, input1, &error);
+                       check_utf8_bytes(input1, input2, &error);
+                       prev_incomplete = is_incomplete(input2);
                }
                else
                {
                        error = bitwise_or(error, prev_incomplete);
                }

-               prev = input;
+               prev = input2;
                s += SIMD_STRIDE_LENGTH;
                len -= SIMD_STRIDE_LENGTH;
        }

So with a few #ifdefs, we can accommodate two sizes if we like.

For another, the prevN() functions would need to change, at least on x86 --
that would require replacing _mm_alignr_epi8() with _mm256_alignr_epi8()
plus _mm256_permute2x128_si256(). Also, we might have to do something with
the vector typedef.

That said, I think we can punt on that until we have an application that's
much more compute-intensive. As it is with SSE4, COPY FROM WHERE <selective
predicate> already pushes the utf8 validation way down in profiles.

> FWIW here are some performance results from my humble RPI4:
>
> master:
>
>  chinese | mixed | ascii
> ---------+-------+-------
>     4172 |  2763 |  1823
> (1 row)
>
> Your v15 patch:
>
>  chinese | mixed | ascii
> ---------+-------+-------
>     2267 |  1248 |   399
> (1 row)
>
> Your v15 patch set + the NEON patch, configured with USE_UTF8_SIMD=1:
>
>  chinese | mixed | ascii
> ---------+-------+-------
>      909 |   620 |   318
> (1 row)
>
> It's so good I wonder if it's producing incorrect results :-)

Nice! If it passes regression tests, it *should* be fine, but stress
testing would be welcome on any platform.

> I also tried to do a quick and dirty AltiVec patch to see if it could
> fit into the same code "shape", with less immediate success: it works
> out slower than the fallback code on the POWER7 machine I scrounged an
> account on.  I'm not sure what's wrong there, but maybe it's a uesful
> start (I'm probably confused about endianness, or the encoding of
> boolean vectors which may be different (is true 0x01or 0xff, does it
> matter?), or something else, and it's falling back on errors all the
> time?).

Hmm, I have access to a power8 machine to play with, but I also don't mind
having some type of server-class hardware that relies on the recent nifty
DFA fallback, which performs even better on powerpc64le than v15.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: [POC] verifying UTF-8 using SIMD instructions

Reply via email to