On Sat, 17 Jul 2021 at 04:48, John Naylor <john.nay...@enterprisedb.com> wrote: > v17-0001 is the same as v14. 0002 is a stripped-down implementation of Amit's > chunk idea for multibyte, and it's pretty good on x86. On Power8, not so > much. 0003 and 0004 are shot-in-the-dark guesses to improve it on Power8, > with some success, but end up making x86 weirdly slow, so I'm afraid that > could happen on other platforms as well.
Thanks for trying the chunk approach. I tested your v17 versions on Arm64. For the chinese characters, v17-0002 gave some improvement over v14. But for all the other character sets, there was around 10% degradation w.r.t. v14. I thought maybe the hhton64 call and memcpy() for each mb character might be the culprit, so I tried iterating over all the characters in the chunk within the same pg_utf8_verify_one() function by left-shifting the bits. But that worsened the figures. So I gave up that idea. Here are the numbers on Arm64 : HEAD: chinese | mixed | ascii | mixed16 | mixed8 ---------+-------+-------+---------+-------- 1781 | 1095 | 628 | 944 | 1151 v14: chinese | mixed | ascii | mixed16 | mixed8 ---------+-------+-------+---------+-------- 852 | 484 | 144 | 584 | 971 v17-0001+2: chinese | mixed | ascii | mixed16 | mixed8 ---------+-------+-------+---------+-------- 731 | 520 | 152 | 645 | 1118 Haven't looked at your v18 patch set yet.