Re: speed up unicode normalization quick check

2020-10-20 Thread Michael Paquier
On Tue, Oct 20, 2020 at 05:33:43AM -0400, John Naylor wrote: > This is cleaner, so I'm good with this. Thanks. Applied this way, then. -- Michael signature.asc Description: PGP signature

Re: speed up unicode normalization quick check

2020-10-20 Thread John Naylor
On Mon, Oct 19, 2020 at 9:49 PM Michael Paquier wrote: > > The aligned numbers have the advantage to make the checks of the code > generated easier, for the contents and the format produced. So using > a right padding as you are suggesting here rather than a new exception > in .gitattributes sou

Re: speed up unicode normalization quick check

2020-10-19 Thread Michael Paquier
On Mon, Oct 19, 2020 at 12:12:00PM -0400, John Naylor wrote: > I see, I should have looked for that when Michael mentioned it. We could > left-justify instead, as in the attached. If it were up to me, though, I'd > just format it like pgindent expects, even if not nice looking. It's just a > bunch

Re: speed up unicode normalization quick check

2020-10-19 Thread John Naylor
On Mon, Oct 19, 2020 at 10:38 AM Tom Lane wrote: > John Naylor writes: > > On Mon, Oct 19, 2020 at 2:16 AM Peter Eisentraut < > > peter.eisentr...@2ndquadrant.com> wrote: > >> Could you adjust the generation script so that the resulting header file > >> passes the git whitespace check? Check th

Re: speed up unicode normalization quick check

2020-10-19 Thread Tom Lane
John Naylor writes: > On Mon, Oct 19, 2020 at 2:16 AM Peter Eisentraut < > peter.eisentr...@2ndquadrant.com> wrote: >> Could you adjust the generation script so that the resulting header file >> passes the git whitespace check? Check the output of >> git show --check 80f8eb79e24d9b7963eaf17ce8466

Re: speed up unicode normalization quick check

2020-10-19 Thread John Naylor
On Mon, Oct 19, 2020 at 2:16 AM Peter Eisentraut < peter.eisentr...@2ndquadrant.com> wrote: > On 2020-10-12 13:36, Michael Paquier wrote: > > On Mon, Oct 12, 2020 at 03:39:51PM +0900, Masahiko Sawada wrote: > >> Yes, this patch resolves the problem. > > > > Okay, applied then. > > Could you adjust

Re: speed up unicode normalization quick check

2020-10-18 Thread Michael Paquier
On Mon, Oct 19, 2020 at 08:15:56AM +0200, Peter Eisentraut wrote: > On 2020-10-12 13:36, Michael Paquier wrote: > > On Mon, Oct 12, 2020 at 03:39:51PM +0900, Masahiko Sawada wrote: > > > Yes, this patch resolves the problem. > > > > Okay, applied then. > > Could you adjust the generation script s

Re: speed up unicode normalization quick check

2020-10-18 Thread Peter Eisentraut
On 2020-10-12 13:36, Michael Paquier wrote: On Mon, Oct 12, 2020 at 03:39:51PM +0900, Masahiko Sawada wrote: Yes, this patch resolves the problem. Okay, applied then. Could you adjust the generation script so that the resulting header file passes the git whitespace check? Check the output

Re: speed up unicode normalization quick check

2020-10-12 Thread Michael Paquier
On Mon, Oct 12, 2020 at 03:39:51PM +0900, Masahiko Sawada wrote: > Yes, this patch resolves the problem. Okay, applied then. -- Michael signature.asc Description: PGP signature

Re: speed up unicode normalization quick check

2020-10-12 Thread Michael Paquier
On Mon, Oct 12, 2020 at 05:46:16AM -0400, John Naylor wrote: > Hmm, I hadn't actually, but now that you mention it, that looks worth > optimizing that as well, since there are multiple callers that search that > table -- thanks for the reminder. The attached patch was easy to whip up, > being simil

Re: speed up unicode normalization quick check

2020-10-11 Thread Masahiko Sawada
On Mon, 12 Oct 2020 at 15:27, Michael Paquier wrote: > > On Mon, Oct 12, 2020 at 02:43:06PM +0900, Masahiko Sawada wrote: > > The following warning recently started to be shown in my > > environment(FreeBSD clang 8.0.1). Maybe it is relevant with this > > commit: > > > > unicode_norm.c:478:12: war

Re: speed up unicode normalization quick check

2020-10-11 Thread Michael Paquier
On Mon, Oct 12, 2020 at 02:43:06PM +0900, Masahiko Sawada wrote: > The following warning recently started to be shown in my > environment(FreeBSD clang 8.0.1). Maybe it is relevant with this > commit: > > unicode_norm.c:478:12: warning: implicit declaration of function > 'htonl' is invalid in C99

Re: speed up unicode normalization quick check

2020-10-11 Thread Masahiko Sawada
On Sun, 11 Oct 2020 at 19:27, Michael Paquier wrote: > > On Thu, Oct 08, 2020 at 06:22:39PM -0400, John Naylor wrote: > > Okay, thanks. > > And applied. The following warning recently started to be shown in my environment(FreeBSD clang 8.0.1). Maybe it is relevant with this commit: unicode_norm.

Re: speed up unicode normalization quick check

2020-10-11 Thread Michael Paquier
On Thu, Oct 08, 2020 at 06:22:39PM -0400, John Naylor wrote: > Okay, thanks. And applied. I did some more micro benchmarking with the quick checks, and the numbers are cool, close to what you mentioned for the quick checks of both NFC and NFKC. Just wondering about something in the same area, di

Re: speed up unicode normalization quick check

2020-10-08 Thread John Naylor
On Thu, Oct 8, 2020 at 8:29 AM Michael Paquier wrote: > On Thu, Oct 08, 2020 at 04:52:18AM -0400, John Naylor wrote: > > Looks fine overall, but one minor nit: I'm curious why you made a > separate > > section in the pgindent exclusions. The style in that file seems to be > one > > comment per ca

Re: speed up unicode normalization quick check

2020-10-08 Thread Michael Paquier
On Thu, Oct 08, 2020 at 04:52:18AM -0400, John Naylor wrote: > Looks fine overall, but one minor nit: I'm curious why you made a separate > section in the pgindent exclusions. The style in that file seems to be one > comment per category. Both parts indeed use PerfectHash.pm, but are generated by

Re: speed up unicode normalization quick check

2020-10-08 Thread John Naylor
On Thu, Oct 8, 2020 at 2:48 AM Michael Paquier wrote: > On Wed, Oct 07, 2020 at 03:18:44PM +0900, Michael Paquier wrote: > I looked at this one again today, and applied it. I looked at what > MSVC compiler was able to do in terms of optimizationswith > shift-and-add for multipliers, and it is by

Re: speed up unicode normalization quick check

2020-10-07 Thread Michael Paquier
On Wed, Oct 07, 2020 at 03:18:44PM +0900, Michael Paquier wrote: > About 0001, the new set of multipliers looks fine to me. Even if this > adds an extra item from 901 to 902 because this can be divided by 17 > in kwlist_d.h, I also don't think that this is really much bothering > and. As mentione

Re: speed up unicode normalization quick check

2020-10-06 Thread Michael Paquier
On Sat, Sep 19, 2020 at 04:09:27PM -0700, Mark Dilger wrote: > I am marking this ready for committer. I didn't object to the > whitespace weirdness in your patch (about which `git apply` > grumbles) since you seem to have done that intentionally. I have no > further comments on the performance is

Re: speed up unicode normalization quick check

2020-09-19 Thread Mark Dilger
> On Sep 19, 2020, at 3:58 PM, John Naylor wrote: > > On Sat, Sep 19, 2020 at 1:46 PM Mark Dilger > wrote: > >> 0002 and 0003 look good to me. I like the way you cleaned up a bit with the >> unicode_norm_props struct, which makes the code a bit more tidy, and on my >> compiler under -O2 i

Re: speed up unicode normalization quick check

2020-09-19 Thread John Naylor
On Sat, Sep 19, 2020 at 1:46 PM Mark Dilger wrote: > 0002 and 0003 look good to me. I like the way you cleaned up a bit with the > unicode_norm_props struct, which makes the code a bit more tidy, and on my > compiler under -O2 it does not generate any extra runtime dereferences, as > the comp

Re: speed up unicode normalization quick check

2020-09-19 Thread Mark Dilger
> On Sep 18, 2020, at 9:41 AM, John Naylor wrote: > > Attached is version 4, which excludes the output file from pgindent, > to match recent commit 74d4608f5. Since it won't be indented again, I > also tweaked the generator script to match pgindent for the typedef, > since we don't want to los

Re: speed up unicode normalization quick check

2020-05-29 Thread John Naylor
On Sat, May 30, 2020 at 12:13 AM Mark Dilger wrote: > > > I forgot in my first round of code review to mention, "thanks for the patch". > I generally like what you are doing here, and am trying to review it so it > gets committed. And I forgot to say thanks for taking a look! > The reason I g

Re: speed up unicode normalization quick check

2020-05-29 Thread Mark Dilger
> On May 28, 2020, at 8:54 PM, John Naylor wrote: > > On Fri, May 29, 2020 at 5:59 AM Mark Dilger > wrote: >> >>> On May 21, 2020, at 12:12 AM, John Naylor >>> wrote: > >>> very picky in general. As a test, it also successfully finds a >>> function for the OS "words" file, the "D" sets of

Re: speed up unicode normalization quick check

2020-05-28 Thread John Naylor
On Fri, May 29, 2020 at 5:59 AM Mark Dilger wrote: > > > On May 21, 2020, at 12:12 AM, John Naylor > > wrote: > > very picky in general. As a test, it also successfully finds a > > function for the OS "words" file, the "D" sets of codepoints, and for > > sets of the first n built-in OIDs, where

Re: speed up unicode normalization quick check

2020-05-28 Thread Mark Dilger
> On May 21, 2020, at 12:12 AM, John Naylor wrote: > > Hi, > > Attached is a patch to use perfect hashing to speed up Unicode > normalization quick check. > > 0001 changes the set of multipliers attempted when generating the hash > function. The set in HEAD works

speed up unicode normalization quick check

2020-05-21 Thread John Naylor
Hi, Attached is a patch to use perfect hashing to speed up Unicode normalization quick check. 0001 changes the set of multipliers attempted when generating the hash function. The set in HEAD works for the current set of NFC codepoints, but not for the other types. Also, the updated multipliers