On Wed, Mar 20, 2019 at 9:24 PM Tom Lane wrote:
> Joel Jacobson writes:
> > I've seen a performance trick in other hash functions [1]
> > to instead read multiple bytes in each iteration,
> > and then handle the remaining bytes after the loop.
> > [1] https://github.com/wangyi-fudan/wyhash/blob/
Joel Jacobson writes:
> I've seen a performance trick in other hash functions [1]
> to instead read multiple bytes in each iteration,
> and then handle the remaining bytes after the loop.
> [1] https://github.com/wangyi-fudan/wyhash/blob/master/wyhash.h#L29
I can't get very excited about this, se
Many thanks for working on this, amazing work, really nice you made it a
separate reusable Perl-module.
The generated hash functions reads one character at a time.
I've seen a performance trick in other hash functions [1]
to instead read multiple bytes in each iteration,
and then handle the remain
John Naylor writes:
> On Wed, Jan 9, 2019 at 5:33 PM Tom Lane wrote:
>> really need here. We could go with "[no-]case-insensitive", perhaps.
>> Or "[no-]case-fold", which is at least a little shorter and less
>> double-negative-y.
> I'd be in favor of --[no-]case-fold.
Yeah, I like that better
On Wed, Jan 9, 2019 at 5:33 PM Tom Lane wrote:
> really need here. We could go with "[no-]case-insensitive", perhaps.
> Or "[no-]case-fold", which is at least a little shorter and less
> double-negative-y.
I'd be in favor of --[no-]case-fold.
On Tue, Jan 8, 2019 at 5:53 PM Tom Lane wrote:
> I
John Naylor writes:
> On Tue, Jan 8, 2019 at 5:31 PM Tom Lane wrote:
>> The length macro was readily available there so I used it. AFAIR
>> that wasn't true elsewhere, though I might've missed something.
>> It's pretty much just belt-and-suspenders coding anyway, since all
>> those arrays are ma
John Naylor writes:
> On Wed, Jan 9, 2019 at 2:04 PM Tom Lane wrote:
>> Now the impedance mismatch about case sensitivity is handled with
>> my $f = PerfectHash::generate_hash_function(\@keywords, $funcname,
>> case_insensitive => !$case_sensitive);
>> which is at least a little clea
John Naylor writes:
> On Wed, Jan 9, 2019 at 2:44 PM Tom Lane wrote:
>> [patch to shrink oid index]
> It would help maintaining its newfound sveltness if we warned if a
> higher oid was assigned, as in the attached. I used 6200 as a soft
> limit, but that could be anything similiar.
I think the
On Wed, Jan 9, 2019 at 2:44 PM Tom Lane wrote:
> [patch to shrink oid index]
It would help maintaining its newfound sveltness if we warned if a
higher oid was assigned, as in the attached. I used 6200 as a soft
limit, but that could be anything similiar.
--
John Naylorhttps://ww
On Tue, Jan 8, 2019 at 5:31 PM Tom Lane wrote:
>
> John Naylor writes:
> > In the committed keyword patch, I noticed that in common/keywords.c,
> > the array length is defined with
> > ScanKeywordCategories[SCANKEYWORDS_NUM_KEYWORDS]
> > but other keyword arrays just have ...[]. Is there a reason
On Wed, Jan 9, 2019 at 2:04 PM Tom Lane wrote:
>
> I wrote:
> > John Naylor writes:
> >> -There is a bit of a cognitive clash between $case_sensitive in
> >> gen_keywordlist.pl and $case_insensitive in PerfectHash.pm. They each
> >> make sense in their own file, but might it be worth using one or
On Wed, Jan 09, 2019 at 02:04:15PM -0500, Tom Lane wrote:
> Also, in view of finding that the original multiplier choices failed
> on the fmgr oid problem, I spent a little effort making the code
> able to try more combinations of hash multipliers and seeds. It'd
> be nice to have some theory rath
Andres Freund writes:
> On 2019-01-09 15:03:35 -0500, Tom Lane wrote:
>> (Speaking of which, I've been wondering for awhile if libpq ought not
>> obtain the OIDs of lo_create and friends by #including fmgroids.h
>> instead of doing a runtime query on every connection. If we did that,
>> we'd be f
Hi,
On 2019-01-09 15:03:35 -0500, Tom Lane wrote:
> Alvaro Herrera writes:
> > On 2019-Jan-09, Tom Lane wrote:
> >> We could make the index table still smaller if we wanted to reassign
> >> a couple dozen high-numbered functions down to lower OIDs, but I dunno
> >> if it's worth the trouble. It
Alvaro Herrera writes:
> On 2019-Jan-09, Tom Lane wrote:
>> We could make the index table still smaller if we wanted to reassign
>> a couple dozen high-numbered functions down to lower OIDs, but I dunno
>> if it's worth the trouble. It certainly isn't from a performance
>> standpoint, because tho
Andres Freund writes:
> On 2019-01-09 14:44:24 -0500, Tom Lane wrote:
>> /* fast lookup only possible if original oid still assigned */
>> -if (id >= FirstGenbkiObjectId)
>> +if (id > fmgr_last_builtin_oid)
>> return NULL;
> An extern reference here will make the code a bit l
Hi,
On 2019-01-09 14:44:24 -0500, Tom Lane wrote:
> I wrote:
> > Also, I fail to understand why fmgr_builtin_oid_index has 1 entries
> > anyway. We could easily have fmgrtab.c expose the last actually assigned
> > builtin function OID (presently 6121) and make the index array only
> > that bi
On 2019-Jan-09, Tom Lane wrote:
> We could make the index table still smaller if we wanted to reassign
> a couple dozen high-numbered functions down to lower OIDs, but I dunno
> if it's worth the trouble. It certainly isn't from a performance
> standpoint, because those unused entry ranges will n
I wrote:
> Also, I fail to understand why fmgr_builtin_oid_index has 1 entries
> anyway. We could easily have fmgrtab.c expose the last actually assigned
> builtin function OID (presently 6121) and make the index array only
> that big, which just about eliminates the space advantage completely
I wrote:
> John Naylor writes:
>> -There is a bit of a cognitive clash between $case_sensitive in
>> gen_keywordlist.pl and $case_insensitive in PerfectHash.pm. They each
>> make sense in their own file, but might it be worth using one or the
>> other?
> Yeah, dunno. It seems to make sense for t
On Tue, Jan 08, 2019 at 05:53:25PM -0500, Tom Lane wrote:
> John Naylor writes:
> > -As for the graph algorithm, I'd have to play with it to understand
> > how it works.
>
> I improved the comment about how come the hash table entry assignment
> works. One thing I'm not clear about myself is
>
John Naylor writes:
> -As for the graph algorithm, I'd have to play with it to understand
> how it works.
I improved the comment about how come the hash table entry assignment
works. One thing I'm not clear about myself is
# A cycle-free graph is either empty or has some vertex of degre
John Naylor writes:
> Just a couple comments about the module:
> -If you qualify the function's module name as you did
> (PerfectHash::generate_hash_function), you don't have to export the
> function into the callers namespace, so you can skip the @EXPORT_OK
> setting. Most of our modules don't e
On Tue, Jan 8, 2019 at 3:04 PM Tom Lane wrote:
> > I'll take a crack at separating into a module. I'll wait a bit in
> > case there are any stylistic suggestions on the patch as it stands.
>
> I had a go at that myself. I'm sure there's plenty to criticize in
> the result, but at least it passes
Andres Freund writes:
> On 2019-01-08 13:41:16 -0500, John Naylor wrote:
>> Do you mean the fmgr table?
> Not the entire fmgr table, but just the builtin oid index, generated by
> the following section:
> ...
> The generated fmgr_builtin_oid_index is pretty sparse, and a more dense
> hashtable mi
John Naylor writes:
> On Tue, Jan 8, 2019 at 12:06 PM Andrew Dunstan
> wrote:
>> If he doesn't I will.
> I'll take a crack at separating into a module. I'll wait a bit in
> case there are any stylistic suggestions on the patch as it stands.
I had a go at that myself. I'm sure there's plenty t
Hi,
On 2019-01-08 13:41:16 -0500, John Naylor wrote:
> On Tue, Jan 8, 2019 at 12:06 PM Andrew Dunstan
> wrote:
> > On 1/7/19 7:52 PM, Andres Freund wrote:
> > > Builtin functions for one, which we'd swatted down last time round due
> > > to gperfs defficiencies.
>
> Do you mean the fmgr table?
On Tue, Jan 8, 2019 at 12:06 PM Andrew Dunstan
wrote:
> On 1/7/19 7:52 PM, Andres Freund wrote:
> > Builtin functions for one, which we'd swatted down last time round due
> > to gperfs defficiencies.
Do you mean the fmgr table?
> >> But in any case, that sounds like a task for someone with
> >>
On 1/7/19 7:52 PM, Andres Freund wrote:
> Hi,
>
> On 2019-01-07 19:37:51 -0500, Tom Lane wrote:
>> Andres Freund writes:
>>> Hm, shouldn't we extract the perfect hash generation into a perl module
>>> or such? It seems that there's plenty other possible uses for it.
>> Such as?
> Builtin functio
Hi,
On 2019-01-07 19:37:51 -0500, Tom Lane wrote:
> Andres Freund writes:
> > Hm, shouldn't we extract the perfect hash generation into a perl module
> > or such? It seems that there's plenty other possible uses for it.
>
> Such as?
Builtin functions for one, which we'd swatted down last time ro
Andres Freund writes:
> Hm, shouldn't we extract the perfect hash generation into a perl module
> or such? It seems that there's plenty other possible uses for it.
Such as? But in any case, that sounds like a task for someone with
more sense of Perl style than I have.
re
I wrote:
> Probably there's a lot to be criticized about the Perl style below;
> anybody feel a need to rewrite it?
Here's a somewhat better version. I realized that I was being too
slavishly tied to the data structures used in the C version; in Perl
it's easier to manage the lists of edges as ha
Hi,
On 2019-01-07 16:11:04 -0500, Tom Lane wrote:
> I wrote:
> > I took a quick look through the NetBSD nbperf sources at
> > http://cvsweb.netbsd.org/bsdweb.cgi/src/usr.bin/nbperf/
> > and I concur with your judgment that we could manage translating
> > that into Perl, especially if we only imple
I wrote:
> I took a quick look through the NetBSD nbperf sources at
> http://cvsweb.netbsd.org/bsdweb.cgi/src/usr.bin/nbperf/
> and I concur with your judgment that we could manage translating
> that into Perl, especially if we only implement the parts we need.
Here's an implementation of that, us
On 1/6/19, Tom Lane wrote:
> I've pushed that version (v8 + max_kw_len); if the buildfarm doesn't
> fall over, we can move on with looking at hashing.
Thank you. The API adjustment looks good, and I'm glad that splitting
out the aux info led to even further cleanups.
-John Naylor
Joerg Sonnenberger writes:
> On Sun, Jan 06, 2019 at 02:29:05PM -0500, Tom Lane wrote:
>> * We should extend the ScanKeywordList representation to include a
>> field holding the longest keyword length in the table, which
>> gen_keywordlist.pl would have no trouble providing. Then we could
>> skip
Joerg Sonnenberger writes:
> On Sun, Jan 06, 2019 at 02:29:05PM -0500, Tom Lane wrote:
>> So we probably can't have inlined hashing code --- I imagine the
>> hash generator needs the flexibility to pick different values of
>> those multipliers.
> Right now, only the initial values are randomized.
On Sun, Jan 06, 2019 at 02:29:05PM -0500, Tom Lane wrote:
> * It's too bad that the hash function doesn't have a return convention
> that allows distinguishing "couldn't possibly match any keyword" from
> "might match keyword 0". I imagine a lot of the zero entries in its
> hashtable could be inte
Joerg Sonnenberger writes:
> On Mon, Jan 07, 2019 at 03:11:55AM +1300, David Rowley wrote:
>> What I'm most interested in is how long it took to generate the hash
>> function in hash2.c?
> It's within the noise floor of time(1) on my laptop, e.g. ~1ms.
I decided to do some simple performance mea
On Mon, Jan 07, 2019 at 03:11:55AM +1300, David Rowley wrote:
> What I'm most interested in is how long it took to generate the hash
> function in hash2.c?
It's within the noise floor of time(1) on my laptop, e.g. ~1ms.
Joerg
On Sat, 5 Jan 2019 at 09:20, John Naylor wrote:
>
> On 1/3/19, Joerg Sonnenberger wrote:
> > Hello John,
> > I was pointed at your patch on IRC and decided to look into adding my
> > own pieces. What I can provide you is a fast perfect hash function
> > generator. I've attached a sample hash fun
I wrote:
> I spent some time hacking on this today, and I think it's committable
> now, but I'm putting it back up in case anyone wants to have another
> look (and also so the cfbot can check it on Windows).
... and indeed, the cfbot doesn't like it. Here's v8, with the
missing addition to Mkvcbu
John Naylor writes:
> [ v6-0001-Use-offset-based-keyword-lookup.patch ]
I spent some time hacking on this today, and I think it's committable
now, but I'm putting it back up in case anyone wants to have another
look (and also so the cfbot can check it on Windows).
Given the discussion about poss
On Fri, Jan 04, 2019 at 02:36:15PM -0800, Andres Freund wrote:
> Hi,
>
> On 2019-01-04 16:43:39 -0500, Tom Lane wrote:
> > Joerg Sonnenberger writes:
> > >> * What's the generator written in? (if the answer's not "Perl", wedging
> > >> it into our build processes might be painful)
> >
> > > Plai
Hi,
On 2019-01-04 16:43:39 -0500, Tom Lane wrote:
> Joerg Sonnenberger writes:
> >> * What's the generator written in? (if the answer's not "Perl", wedging
> >> it into our build processes might be painful)
>
> > Plain C, nothing really fancy in it.
>
> That's actually a bigger problem than you
Joerg Sonnenberger writes:
> On Fri, Jan 04, 2019 at 03:31:11PM -0500, Tom Lane wrote:
>> The sample hash function certainly looks great. I'm not terribly on board
>> with using |0x20 as a substitute for lower-casing, but that's a minor
>> detail.
> Yeah, I've included that part more because I d
On Fri, Jan 04, 2019 at 03:31:11PM -0500, Tom Lane wrote:
> John Naylor writes:
> > On 1/3/19, Joerg Sonnenberger wrote:
> >> I was pointed at your patch on IRC and decided to look into adding my
> >> own pieces. What I can provide you is a fast perfect hash function
> >> generator. I've attache
On 12/27/18, Tom Lane wrote:
> If you really are hot about saving that other 440 bytes, the way to
> do it would be to drop the struct entirely and use two parallel
> arrays, an int16[] for value and a char[] (or better uint8[]) for
> category. Those would be filled by reading kwlist.h twice with
Andres Freund writes:
> On 2019-01-04 12:26:18 -0500, Tom Lane wrote:
>> I'm wondering where we want to go with this. Is anyone excited
>> about pursuing the perfect-hash-function idea? (Joerg's example
>> function looked pretty neat to me.) If we are going to do that,
>> does it make sense to
John Naylor writes:
> On 1/3/19, Joerg Sonnenberger wrote:
>> I was pointed at your patch on IRC and decided to look into adding my
>> own pieces. What I can provide you is a fast perfect hash function
>> generator. I've attached a sample hash function based on the current
>> main keyword list.
On 2019-01-04 12:26:18 -0500, Tom Lane wrote:
> I wrote:
> > Andres Freund writes:
> >> On 2018-12-29 16:59:52 -0500, John Naylor wrote:
> >>> I think 0001 with complete keyword lookup replacement is in decent
> >>> enough shape to post. Make check-world passes. A few notes and
> >>> caveats:
>
>
On 1/3/19, Joerg Sonnenberger wrote:
> Hello John,
> I was pointed at your patch on IRC and decided to look into adding my
> own pieces. What I can provide you is a fast perfect hash function
> generator. I've attached a sample hash function based on the current
> main keyword list. hash() essent
On 12/30/18, Andres Freund wrote:
> I tried to take this for a spin, an for me the build fails because various
> frontend programs don't have KeywordOffsets/Strings defined, but reference
> it
> through various functions exposed to the frontend (like fmtId()). That I
> see
> that error but you do
I wrote:
> Andres Freund writes:
>> On 2018-12-29 16:59:52 -0500, John Naylor wrote:
>>> I think 0001 with complete keyword lookup replacement is in decent
>>> enough shape to post. Make check-world passes. A few notes and
>>> caveats:
>> I tried to take this for a spin, an for me the build fails
On Sun, Dec 16, 2018 at 11:50:15AM -0500, John Naylor wrote:
> A few months ago I was looking into faster search algorithms for
> ScanKeywordLookup(), so this is interesting to me. While an optimal
> full replacement would be a lot of work, the above ideas are much less
> invasive and would still h
Andres Freund writes:
> On 2018-12-29 16:59:52 -0500, John Naylor wrote:
>> I think 0001 with complete keyword lookup replacement is in decent
>> enough shape to post. Make check-world passes. A few notes and
>> caveats:
> I tried to take this for a spin, an for me the build fails because various
Hi,
On 2018-12-29 16:59:52 -0500, John Naylor wrote:
> I think 0001 with complete keyword lookup replacement is in decent
> enough shape to post. Make check-world passes. A few notes and
> caveats:
I tried to take this for a spin, an for me the build fails because various
frontend programs don't
I think 0001 with complete keyword lookup replacement is in decent
enough shape to post. Make check-world passes. A few notes and
caveats:
-I added an --extern option to the script for the core keyword
headers. This also capitalizes variables.
-ECPG keyword lookup is a bit different in that the ec
On 2018-12-27 14:22:11 -0500, Tom Lane wrote:
> Andrew Dunstan writes:
> > A smaller project might be to see if we can replace the binary keyword
> > search in ScanKeyword with a perfect hashing function generated by
> > gperf, or something similar. I had a quick look at that, too.
>
> Yeah, we'v
On Thu, Dec 27, 2018 at 07:04:41PM -0300, Alvaro Herrera wrote:
> On 2018-Dec-27, Tom Lane wrote:
>
> > I poked around a little on my own machines, and I can confirm that
> > Getopt::Long is present in a default Perl install-from-source at
> > least as far back as perl 5.6.1. It's barely conceiva
On 2018-Dec-27, Tom Lane wrote:
> I poked around a little on my own machines, and I can confirm that
> Getopt::Long is present in a default Perl install-from-source at
> least as far back as perl 5.6.1. It's barely conceivable that some
> packager might omit it from their minimal package, but Red
Andrew Dunstan writes:
> On 12/27/18 3:34 PM, Tom Lane wrote:
>> ... that's a pretty thin argument, and if Getopt::Long is present even
>> in the most minimal Perl installations then it's certainly moot.
> It's bundled separately, but on both systems I looked at it's needed by
> the base perl pac
On 12/27/18 3:34 PM, Tom Lane wrote:
> Andrew Dunstan writes:
>> On 12/27/18 3:00 PM, John Naylor wrote:
>>> This style was cargo-culted from the catalog scripts. I can settle on
>>> just the first form if you like.
>> I would rather we used the standard perl module Getopt::Long, as
>> numerous
Andrew Dunstan writes:
> On 12/27/18 3:00 PM, John Naylor wrote:
>> This style was cargo-culted from the catalog scripts. I can settle on
>> just the first form if you like.
> I would rather we used the standard perl module Getopt::Long, as
> numerous programs we have already do.
Hmm ... greppin
On 12/27/18 3:00 PM, John Naylor wrote:
> On 12/27/18, Tom Lane wrote:
>> diff --git a/src/tools/gen_keywords.pl b/src/tools/gen_keywords.pl
>> +elsif ($arg =~ /^-o/)
>> +{
>> +$output_path = length($arg) > 2 ? substr($arg, 2) : shift @ARGV;
>> +}
>>
>> My perl-fu is not
John Naylor writes:
> On 12/27/18, Tom Lane wrote:
>> +$kw_input_file =~ /((\w*)kwlist)\.h/;
>> +my $base_filename = $1;
>> +$prefix = $2 if !defined $prefix;
>>
>> Hmm, what happens if the input filename does not end with "kwlist.h"?
> If that's a maintainability hazard, I can force every invo
On 12/27/18, Tom Lane wrote:
> diff --git a/src/tools/gen_keywords.pl b/src/tools/gen_keywords.pl
> + elsif ($arg =~ /^-o/)
> + {
> + $output_path = length($arg) > 2 ? substr($arg, 2) : shift @ARGV;
> + }
>
> My perl-fu is not great, but it looks like this will accept argum
Andrew Dunstan writes:
> RD parsers are not terribly hard to write.
Sure, as long as they are for grammars that are (a) small, (b) static,
and (c) LL(1), which is strictly weaker than the LALR(1) grammar class
that bison can handle. We already have a whole lot of constructs that
are at the edges
John Naylor writes:
> Barring additional bikeshedding on 0001, I'll plan on implementing
> offset-based lookup for the other keyword types and retire the old
> ScanKeyword. Once that's done, we can benchmark and compare with the
> optimizations in 0002.
Sounds like a plan.
Assorted minor bikeshe
On 12/27/18 12:12 PM, Tom Lane wrote:
>> I don't buy that we're inable to write a descent parser that way.
> I do not think that we could write one for the current state of the
> PG grammar without an investment of effort so large that it's not
> going to happen. Even if such a parser were to s
Andres Freund writes:
> On 2018-12-26 14:03:57 -0500, Tom Lane wrote:
>> It's impossible to write correct RD parsers by hand for any but the most
>> trivial, conflict-free languages, and what we have got to deal with
>> is certainly neither of those; moreover, it's a constantly moving target.
>> W
On 12/26/18, John Naylor wrote:
> On 12/26/18, Robert Haas wrote:
>> I wonder if we could do something really simple like a lookup based on
>> the first character of the scan keyword. It looks to me like there are
>> 440 keywords right now, and the most common starting letter is 'c',
>> which is
Hi,
On 2018-12-26 14:03:57 -0500, Tom Lane wrote:
> Andres Freund writes:
> > My bet is, and has been for quite a while, that we'll have to go for a
> > hand-written recursive descent type parser.
>
> I will state right up front that that will happen over my dead body.
>
> It's impossible to wr
On 12/26/18, Robert Haas wrote:
> I wonder if we could do something really simple like a lookup based on
> the first character of the scan keyword. It looks to me like there are
> 440 keywords right now, and the most common starting letter is 'c',
> which is the first letter of 51 keywords. So dis
Andres Freund writes:
> On 2018-12-26 10:45:11 -0500, Robert Haas wrote:
>> I'm not sure that I understand quite what you have in mind for a
>> serialized non-perfect hashtable. Are you thinking that we'd just
>> construct a simplehash and serialize it?
> I was basically thinking that we'd have
Andres Freund writes:
> My bet is, and has been for quite a while, that we'll have to go for a
> hand-written recursive descent type parser.
I will state right up front that that will happen over my dead body.
It's impossible to write correct RD parsers by hand for any but the most
trivial, conf
Hi,
On 2018-12-26 10:45:11 -0500, Robert Haas wrote:
> I'm not sure that I understand quite what you have in mind for a
> serialized non-perfect hashtable. Are you thinking that we'd just
> construct a simplehash and serialize it?
I was basically thinking that we'd have the perl script implement
Hi,
On 2018-12-26 11:50:18 -0500, Robert Haas wrote:
> On Wed, Dec 26, 2018 at 11:22 AM Tom Lane wrote:
> > I think there's a lot of goalpost-moving going on here. The original
> > idea was to trim the physical size of the data structure, as stated
> > in the thread subject, and just reap whatev
Robert Haas writes:
> I'm kinda surprised that you haven't seen ScanKeywordLookup() in
> there, but I agree with you that the size of the main parser tables is
> a real issue, and that there's no easy solution. At various times
> there has been discussion of using some other parser generator, and
David Fetter writes:
> On Wed, Dec 26, 2018 at 11:22:39AM -0500, Tom Lane wrote:
>> In my hands, the only part of the low-level parsing code that
>> commonly shows up as interesting in profiles is the Bison engine.
> Should we be considering others?
We've looked around before, IIRC, and not real
On Wed, Dec 26, 2018 at 11:22 AM Tom Lane wrote:
> I think there's a lot of goalpost-moving going on here. The original
> idea was to trim the physical size of the data structure, as stated
> in the thread subject, and just reap whatever cache benefits we got
> along the way from that. I am dubi
On Wed, Dec 26, 2018 at 11:22:39AM -0500, Tom Lane wrote:
>
> In my hands, the only part of the low-level parsing code that
> commonly shows up as interesting in profiles is the Bison engine.
Should we be considering others? As I understand it, steps have been
made in this field since yacc was or
Robert Haas writes:
> On Wed, Dec 19, 2018 at 8:01 PM Andres Freund wrote:
>> The last time I looked into perfect hash functions, it wasn't easy to
>> find a generator that competed with a decent normal hashtable (in
>> particular gperf's are very unconvincing). The added tooling is a
>> concern
On Wed, Dec 19, 2018 at 8:01 PM Andres Freund wrote:
> The last time I looked into perfect hash functions, it wasn't easy to
> find a generator that competed with a decent normal hashtable (in
> particular gperf's are very unconvincing). The added tooling is a
> concern imo. OTOH, we're comparing
On 12/22/18, Tom Lane wrote:
> John Naylor writes:
>> Using a single file also gave me another idea: Take value and category
>> out of ScanKeyword, and replace them with an index into another array
>> containing those, which will only be accessed in the event of a hit.
>> That would shrink ScanKe
Andres Freund writes:
> On 2018-12-22 12:20:00 -0500, Tom Lane wrote:
>> I like that idea a *lot*, actually, because it offers the opportunity
>> to decouple this mechanism from all assumptions about what the
>> auxiliary data for a keyword is.
> OTOH, it doubles or triples the number of cachelin
Hi,
On 2018-12-22 12:20:00 -0500, Tom Lane wrote:
> John Naylor writes:
> > Using a single file also gave me another idea: Take value and category
> > out of ScanKeyword, and replace them with an index into another array
> > containing those, which will only be accessed in the event of a hit.
> >
John Naylor writes:
> Using a single file also gave me another idea: Take value and category
> out of ScanKeyword, and replace them with an index into another array
> containing those, which will only be accessed in the event of a hit.
> That would shrink ScanKeyword to 4 bytes (offset, index), fu
On 12/20/18, Tom Lane wrote:
> I'd be inclined to put the script in src/tools, I think. IMO src/common
> is for code that actually gets built into our executables.
Done.
>> which takes
>> pl_unreserved_kwlist.h as input and outputs
>> pl_unreserved_kwlist_offset.h and pl_unreserved_kwlist_strin
John Naylor writes:
> On 12/18/18, Tom Lane wrote:
>> I'd be kind of inclined to convert all uses of ScanKeyword to the new way,
>> if only for consistency's sake. On the other hand, I'm not the one
>> volunteering to do the work.
> That's reasonable, as long as the design is nailed down first.
On 12/19/18, Andrew Gierth wrote:
> Is there any particular reason not to go further and use a perfect hash
> function for the lookup, rather than binary search?
When I was investigating faster algorithms, I ruled out gperf based on
discussions in the archives. The approach here has modest goals
Andrew Gierth writes:
> Is there any particular reason not to go further and use a perfect hash
> function for the lookup, rather than binary search?
Tooling? I seem to recall having looked at gperf and deciding that it
pretty much sucked, so it's not real clear to me what we would use.
Hi,
On 2018-12-20 00:54:39 +, Andrew Gierth wrote:
> > "John" == John Naylor writes:
>
> > On 12/18/18, Tom Lane wrote:
> >> I'd be kind of inclined to convert all uses of ScanKeyword to the
> >> new way, if only for consistency's sake. On the other hand, I'm not
> >> the one volunt
> "John" == John Naylor writes:
> On 12/18/18, Tom Lane wrote:
>> I'd be kind of inclined to convert all uses of ScanKeyword to the
>> new way, if only for consistency's sake. On the other hand, I'm not
>> the one volunteering to do the work.
John> That's reasonable, as long as the des
On 12/18/18, Tom Lane wrote:
> I'd be kind of inclined to convert all uses of ScanKeyword to the new way,
> if only for consistency's sake. On the other hand, I'm not the one
> volunteering to do the work.
That's reasonable, as long as the design is nailed down first. Along
those lines, attached
John Naylor writes:
> On 12/17/18, Tom Lane wrote:
>> Also, wouldn't we also adopt this technology for its unreserved keywords,
>> too?
> We wouldn't be forced to, but there might be other reasons to do so.
> Were you thinking of code consistency (within pl_scanner.c or
> globally)? Or something
On 12/17/18, Tom Lane wrote:
> John Naylor writes:
>> Since PL/pgSQL uses the core scanner, we'd need to use offsets in its
>> reserved_keywords[], too. Those don't change much, so we can probably
>> get away with hard-coding the offsets and the giant string in that
>> case. (If that's not accept
John Naylor writes:
> A few months ago I was looking into faster search algorithms for
> ScanKeywordLookup(), so this is interesting to me. While an optimal
> full replacement would be a lot of work, the above ideas are much less
> invasive and would still have some benefit. Unless anyone intends
On 10/15/18, Tom Lane wrote:
> Andres Freund writes:
>> On 2018-10-15 16:36:26 -0400, Tom Lane wrote:
>>> We could possibly fix these by changing the data structure so that
>>> what's in a ScanKeywords entry is an offset into some giant string
>>> constant somewhere. No idea how that would affec
99 matches
Mail list logo