Re: speed up unicode normalization quick check

John Naylor Thu, 28 May 2020 20:55:25 -0700

On Fri, May 29, 2020 at 5:59 AM Mark Dilger
<[email protected]> wrote:
>
> > On May 21, 2020, at 12:12 AM, John Naylor <[email protected]> 
> > wrote:


> > very picky in general. As a test, it also successfully finds a
> > function for the OS "words" file, the "D" sets of codepoints, and for
> > sets of the first n built-in OIDs, where n > 5.
>
> Prior to this patch, src/tools/gen_keywordlist.pl is the only script that 
> uses PerfectHash.  Your patch adds a second.  I'm not convinced that 
> modifying the PerfectHash code directly each time a new caller needs 
> different multipliers is the right way to go.

Calling it "each time" with a sample size of two is a bit of a
stretch. The first implementation made a reasonable attempt to suit
future uses and I simply made it a bit more robust. In the text quoted
above you can see I tested some scenarios beyond the current use
cases, with key set sizes as low as 6 and as high as 250k.

> Could you instead make them arguments such that gen_keywordlist.pl, 
> generate-unicode_combining_table.pl, and future callers can pass in the 
> numbers they want?  Or is there some advantage to having it this way?

That is an implementation detail that callers have no business knowing about.

-- 
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: speed up unicode normalization quick check

Reply via email to