On Thu, Jun 3, 2021 at 9:16 AM Heikki Linnakangas <hlinn...@iki.fi> wrote:
> Some ideas: > > 1. Better to check if any high bits are set first. We care more about > the speed of that than of detecting zero bytes, because input with high > bits is valid but zeros are an error. > > 2. Since we check that there are no high bits, we can do the zero-checks > with fewer instructions like this: Both ideas make sense, and I like the shortcut we can take with the zero check. I think Greg is right that the zero check needs “half1 & half2”, so I tested with that (updated patches attached). > What test set have you been using for performance testing this? I'd like The microbenchmark is the same one you attached to [1], which I extended with a 95% multibyte case. With the new zero check: clang 12.0.5 / MacOS: master: chinese | mixed | ascii ---------+-------+------- 981 | 688 | 371 0001: chinese | mixed | ascii ---------+-------+------- 932 | 548 | 110 plus optimized zero check: chinese | mixed | ascii ---------+-------+------- 689 | 573 | 59 It makes sense that the Chinese text case is faster since the zero check is skipped. gcc 4.8.5 / Linux: master: chinese | mixed | ascii ---------+-------+------- 2561 | 1493 | 825 0001: chinese | mixed | ascii ---------+-------+------- 2968 | 1035 | 158 plus optimized zero check: chinese | mixed | ascii ---------+-------+------- 2413 | 1078 | 137 The second machine is a bit older and has an old compiler, but there is still a small speed increase. In fact, without Heikki's tweaks, 0001 regresses on multibyte. (Note: I'm not seeing the 7x improvement I claimed for 0001 here, but that was from memory and I think that was a different machine and newer gcc. We can report a range of results as we proceed.) [1] https://www.postgresql.org/message-id/06d45421-61b8-86dd-e765-f1ce527a5...@iki.fi -- John Naylor EDB: http://www.enterprisedb.com