John Naylor <jcnay...@gmail.com> writes: > On 1/3/19, Joerg Sonnenberger <jo...@bec.de> wrote: >> I was pointed at your patch on IRC and decided to look into adding my >> own pieces. What I can provide you is a fast perfect hash function >> generator. I've attached a sample hash function based on the current >> main keyword list. hash() essentially gives you the number of the only >> possible match, a final strcmp/memcmp is still necessary to verify that >> it is an actual keyword though. The |0x20 can be dropped if all cases >> have pre-lower-cased the input already. This would replace the binary >> search in the lookup functions. Returning offsets directly would be easy >> as well. That allows writing a single string where each entry is prefixed >> with a type mask, the token id, the length of the keyword and the actual >> keyword text. Does that sound useful to you?
> Judging by previous responses, there is still interest in using > perfect hash functions, so thanks for this. I'm not knowledgeable > enough to judge its implementation, so I'll leave that for others. We haven't actually seen the implementation, so it's hard to judge ;-). The sample hash function certainly looks great. I'm not terribly on board with using |0x20 as a substitute for lower-casing, but that's a minor detail. The foremost questions in my mind are: * What's the generator written in? (if the answer's not "Perl", wedging it into our build processes might be painful) * What license is it under? * Does it always suceed in producing a single-level lookup table? regards, tom lane