I've decided I'm not quite comfortable with the additional complexity in
the build system introduced by the SIMD portion of the previous patches. It
would make more sense if the pure C portion were unchanged, but with the
shift-based DFA plus the bitwise ASCII check, we have a portable
implementation that's still a substantial improvement over the current
validator. In v24, I've included only that much, and the diff is only about
1/3 as many lines. If future improvements to COPY FROM put additional
pressure on this path, we can always add SIMD support later.

One thing not in this patch is a possible improvement to
pg_utf8_verifychar() that Heikki and I worked on upthread as part of
earlier attempts to rewrite pg_utf8_verifystr(). That's worth looking into
separately.

On Thu, Aug 26, 2021 at 12:09 PM Vladimir Sitnikov <
sitnikov.vladi...@gmail.com> wrote:
>
> >Attached is v23 incorporating the 32-bit transition table, with the
necessary comment adjustments
>
> 32bit table is nice.

Thanks for taking a look!

> Would you please replace
https://github.com/BobSteagall/utf_utils/blob/master/src/utf_utils.cpp URL
with
>
https://github.com/BobSteagall/utf_utils/blob/6b7a465265de2f5fa6133d653df0c9bdd73bbcf8/src/utf_utils.cpp
> in the header of src/port/pg_utf8_fallback.c?
>
> It would make the URL more stable in case the file gets renamed.
>
> Vladimir
>

Makes sense, so done that way.

--
John Naylor
EDB: http://www.enterprisedb.com

Attachment: v24-0001-Add-fast-path-for-validating-UTF-8-text.patch
Description: Binary data

Reply via email to