On Thu, Jun 3, 2021 at 3:22 PM Heikki Linnakangas <hlinn...@iki.fi> wrote:
>
> On 03/06/2021 22:16, Heikki Linnakangas wrote:
> > On 03/06/2021 22:10, John Naylor wrote:
> >> On Thu, Jun 3, 2021 at 3:08 PM Heikki Linnakangas <hlinn...@iki.fi
> >> <mailto:hlinn...@iki.fi>> wrote:
> >>   >                 x1 = half1 + UINT64CONST(0x7f7f7f7f7f7f7f7f);
> >>   >                 x2 = half2 + UINT64CONST(0x7f7f7f7f7f7f7f7f);
> >>   >
> >>   >                 /* then check that the high bit is set in each
byte. */
> >>   >                 x = (x1 | x2);
> >>   >                 x &= UINT64CONST(0x8080808080808080);
> >>   >                 if (x != UINT64CONST(0x8080808080808080))
> >>   >                         return 0;

> If you replace (x1 | x2) with (x1 & x2) above, I think it's correct.

After looking at it again with fresh eyes, I agree this is correct. I
modified the regression tests to pad the input bytes with ascii so that the
code path that works on 16-bytes at a time is tested. I use both UTF-8
input tables for some of the additional tests. There is a de facto
requirement that the descriptions are unique across both of the input
tables. That could be done more elegantly, but I wanted to keep things
simple for now.

v11-0001 is an improvement over v10:

clang 12.0.5 / MacOS:

master:

 chinese | mixed | ascii
---------+-------+-------
     975 |   686 |   369

v10-0001:

 chinese | mixed | ascii
---------+-------+-------
     930 |   549 |   109

v11-0001:

 chinese | mixed | ascii
---------+-------+-------
     687 |   440 |    64


gcc 4.8.5 / Linux (older machine)

master:

 chinese | mixed | ascii
---------+-------+-------
    2559 |  1495 |   825

v10-0001:

 chinese | mixed | ascii
---------+-------+-------
    2966 |  1034 |   156

v11-0001:

 chinese | mixed | ascii
---------+-------+-------
    2242 |   824 |   140

Previous testing on POWER8 and Arm64 leads me to expect similar results
there as well.

I also looked again at 0002 and decided I wasn't quite happy with the test
coverage. Previously, the code padded out a short input with ascii so that
the 16-bytes-at-a-time code path was always exercised. However, that
required some finicky complexity and still wasn't adequate. For v11, I
ripped that out and put the responsibility on the regression tests to make
sure the various code paths are exercised.

--
John Naylor
EDB: http://www.enterprisedb.com

Attachment: v11-0001-Rewrite-pg_utf8_verifystr-for-speed.patch
Description: Binary data

Attachment: v11-0002-Use-SSE-instructions-for-pg_utf8_verifystr-where.patch
Description: Binary data

Reply via email to