Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-03-21 Thread Joel Jacobson
On Wed, Mar 20, 2019 at 9:24 PM Tom Lane wrote: > Joel Jacobson writes: > > I've seen a performance trick in other hash functions [1] > > to instead read multiple bytes in each iteration, > > and then handle the remaining bytes after the loop. > > [1] https://github.com/wangyi-fudan/wyhash/blob/

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-03-20 Thread Tom Lane
Joel Jacobson writes: > I've seen a performance trick in other hash functions [1] > to instead read multiple bytes in each iteration, > and then handle the remaining bytes after the loop. > [1] https://github.com/wangyi-fudan/wyhash/blob/master/wyhash.h#L29 I can't get very excited about this, se

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-03-20 Thread Joel Jacobson
Many thanks for working on this, amazing work, really nice you made it a separate reusable Perl-module. The generated hash functions reads one character at a time. I've seen a performance trick in other hash functions [1] to instead read multiple bytes in each iteration, and then handle the remain

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-09 Thread Tom Lane
John Naylor writes: > On Wed, Jan 9, 2019 at 5:33 PM Tom Lane wrote: >> really need here. We could go with "[no-]case-insensitive", perhaps. >> Or "[no-]case-fold", which is at least a little shorter and less >> double-negative-y. > I'd be in favor of --[no-]case-fold. Yeah, I like that better

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-09 Thread John Naylor
On Wed, Jan 9, 2019 at 5:33 PM Tom Lane wrote: > really need here. We could go with "[no-]case-insensitive", perhaps. > Or "[no-]case-fold", which is at least a little shorter and less > double-negative-y. I'd be in favor of --[no-]case-fold. On Tue, Jan 8, 2019 at 5:53 PM Tom Lane wrote: > I

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-09 Thread Tom Lane
John Naylor writes: > On Tue, Jan 8, 2019 at 5:31 PM Tom Lane wrote: >> The length macro was readily available there so I used it. AFAIR >> that wasn't true elsewhere, though I might've missed something. >> It's pretty much just belt-and-suspenders coding anyway, since all >> those arrays are ma

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-09 Thread Tom Lane
John Naylor writes: > On Wed, Jan 9, 2019 at 2:04 PM Tom Lane wrote: >> Now the impedance mismatch about case sensitivity is handled with >> my $f = PerfectHash::generate_hash_function(\@keywords, $funcname, >> case_insensitive => !$case_sensitive); >> which is at least a little clea

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-09 Thread Tom Lane
John Naylor writes: > On Wed, Jan 9, 2019 at 2:44 PM Tom Lane wrote: >> [patch to shrink oid index] > It would help maintaining its newfound sveltness if we warned if a > higher oid was assigned, as in the attached. I used 6200 as a soft > limit, but that could be anything similiar. I think the

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-09 Thread John Naylor
On Wed, Jan 9, 2019 at 2:44 PM Tom Lane wrote: > [patch to shrink oid index] It would help maintaining its newfound sveltness if we warned if a higher oid was assigned, as in the attached. I used 6200 as a soft limit, but that could be anything similiar. -- John Naylorhttps://ww

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-09 Thread John Naylor
On Tue, Jan 8, 2019 at 5:31 PM Tom Lane wrote: > > John Naylor writes: > > In the committed keyword patch, I noticed that in common/keywords.c, > > the array length is defined with > > ScanKeywordCategories[SCANKEYWORDS_NUM_KEYWORDS] > > but other keyword arrays just have ...[]. Is there a reason

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-09 Thread John Naylor
On Wed, Jan 9, 2019 at 2:04 PM Tom Lane wrote: > > I wrote: > > John Naylor writes: > >> -There is a bit of a cognitive clash between $case_sensitive in > >> gen_keywordlist.pl and $case_insensitive in PerfectHash.pm. They each > >> make sense in their own file, but might it be worth using one or

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-09 Thread Joerg Sonnenberger
On Wed, Jan 09, 2019 at 02:04:15PM -0500, Tom Lane wrote: > Also, in view of finding that the original multiplier choices failed > on the fmgr oid problem, I spent a little effort making the code > able to try more combinations of hash multipliers and seeds. It'd > be nice to have some theory rath

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-09 Thread Tom Lane
Andres Freund writes: > On 2019-01-09 15:03:35 -0500, Tom Lane wrote: >> (Speaking of which, I've been wondering for awhile if libpq ought not >> obtain the OIDs of lo_create and friends by #including fmgroids.h >> instead of doing a runtime query on every connection. If we did that, >> we'd be f

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-09 Thread Andres Freund
Hi, On 2019-01-09 15:03:35 -0500, Tom Lane wrote: > Alvaro Herrera writes: > > On 2019-Jan-09, Tom Lane wrote: > >> We could make the index table still smaller if we wanted to reassign > >> a couple dozen high-numbered functions down to lower OIDs, but I dunno > >> if it's worth the trouble. It

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-09 Thread Tom Lane
Alvaro Herrera writes: > On 2019-Jan-09, Tom Lane wrote: >> We could make the index table still smaller if we wanted to reassign >> a couple dozen high-numbered functions down to lower OIDs, but I dunno >> if it's worth the trouble. It certainly isn't from a performance >> standpoint, because tho

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-09 Thread Tom Lane
Andres Freund writes: > On 2019-01-09 14:44:24 -0500, Tom Lane wrote: >> /* fast lookup only possible if original oid still assigned */ >> -if (id >= FirstGenbkiObjectId) >> +if (id > fmgr_last_builtin_oid) >> return NULL; > An extern reference here will make the code a bit l

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-09 Thread Andres Freund
Hi, On 2019-01-09 14:44:24 -0500, Tom Lane wrote: > I wrote: > > Also, I fail to understand why fmgr_builtin_oid_index has 1 entries > > anyway. We could easily have fmgrtab.c expose the last actually assigned > > builtin function OID (presently 6121) and make the index array only > > that bi

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-09 Thread Alvaro Herrera
On 2019-Jan-09, Tom Lane wrote: > We could make the index table still smaller if we wanted to reassign > a couple dozen high-numbered functions down to lower OIDs, but I dunno > if it's worth the trouble. It certainly isn't from a performance > standpoint, because those unused entry ranges will n

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-09 Thread Tom Lane
I wrote: > Also, I fail to understand why fmgr_builtin_oid_index has 1 entries > anyway. We could easily have fmgrtab.c expose the last actually assigned > builtin function OID (presently 6121) and make the index array only > that big, which just about eliminates the space advantage completely

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-09 Thread Tom Lane
I wrote: > John Naylor writes: >> -There is a bit of a cognitive clash between $case_sensitive in >> gen_keywordlist.pl and $case_insensitive in PerfectHash.pm. They each >> make sense in their own file, but might it be worth using one or the >> other? > Yeah, dunno. It seems to make sense for t

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-08 Thread Joerg Sonnenberger
On Tue, Jan 08, 2019 at 05:53:25PM -0500, Tom Lane wrote: > John Naylor writes: > > -As for the graph algorithm, I'd have to play with it to understand > > how it works. > > I improved the comment about how come the hash table entry assignment > works. One thing I'm not clear about myself is >

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-08 Thread Tom Lane
John Naylor writes: > -As for the graph algorithm, I'd have to play with it to understand > how it works. I improved the comment about how come the hash table entry assignment works. One thing I'm not clear about myself is # A cycle-free graph is either empty or has some vertex of degre

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-08 Thread Tom Lane
John Naylor writes: > Just a couple comments about the module: > -If you qualify the function's module name as you did > (PerfectHash::generate_hash_function), you don't have to export the > function into the callers namespace, so you can skip the @EXPORT_OK > setting. Most of our modules don't e

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-08 Thread John Naylor
On Tue, Jan 8, 2019 at 3:04 PM Tom Lane wrote: > > I'll take a crack at separating into a module. I'll wait a bit in > > case there are any stylistic suggestions on the patch as it stands. > > I had a go at that myself. I'm sure there's plenty to criticize in > the result, but at least it passes

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-08 Thread Tom Lane
Andres Freund writes: > On 2019-01-08 13:41:16 -0500, John Naylor wrote: >> Do you mean the fmgr table? > Not the entire fmgr table, but just the builtin oid index, generated by > the following section: > ... > The generated fmgr_builtin_oid_index is pretty sparse, and a more dense > hashtable mi

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-08 Thread Tom Lane
John Naylor writes: > On Tue, Jan 8, 2019 at 12:06 PM Andrew Dunstan > wrote: >> If he doesn't I will. > I'll take a crack at separating into a module. I'll wait a bit in > case there are any stylistic suggestions on the patch as it stands. I had a go at that myself. I'm sure there's plenty t

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-08 Thread Andres Freund
Hi, On 2019-01-08 13:41:16 -0500, John Naylor wrote: > On Tue, Jan 8, 2019 at 12:06 PM Andrew Dunstan > wrote: > > On 1/7/19 7:52 PM, Andres Freund wrote: > > > Builtin functions for one, which we'd swatted down last time round due > > > to gperfs defficiencies. > > Do you mean the fmgr table?

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-08 Thread John Naylor
On Tue, Jan 8, 2019 at 12:06 PM Andrew Dunstan wrote: > On 1/7/19 7:52 PM, Andres Freund wrote: > > Builtin functions for one, which we'd swatted down last time round due > > to gperfs defficiencies. Do you mean the fmgr table? > >> But in any case, that sounds like a task for someone with > >>

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-08 Thread Andrew Dunstan
On 1/7/19 7:52 PM, Andres Freund wrote: > Hi, > > On 2019-01-07 19:37:51 -0500, Tom Lane wrote: >> Andres Freund writes: >>> Hm, shouldn't we extract the perfect hash generation into a perl module >>> or such? It seems that there's plenty other possible uses for it. >> Such as? > Builtin functio

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-07 Thread Andres Freund
Hi, On 2019-01-07 19:37:51 -0500, Tom Lane wrote: > Andres Freund writes: > > Hm, shouldn't we extract the perfect hash generation into a perl module > > or such? It seems that there's plenty other possible uses for it. > > Such as? Builtin functions for one, which we'd swatted down last time ro

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-07 Thread Tom Lane
Andres Freund writes: > Hm, shouldn't we extract the perfect hash generation into a perl module > or such? It seems that there's plenty other possible uses for it. Such as? But in any case, that sounds like a task for someone with more sense of Perl style than I have. re

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-07 Thread Tom Lane
I wrote: > Probably there's a lot to be criticized about the Perl style below; > anybody feel a need to rewrite it? Here's a somewhat better version. I realized that I was being too slavishly tied to the data structures used in the C version; in Perl it's easier to manage the lists of edges as ha

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-07 Thread Andres Freund
Hi, On 2019-01-07 16:11:04 -0500, Tom Lane wrote: > I wrote: > > I took a quick look through the NetBSD nbperf sources at > > http://cvsweb.netbsd.org/bsdweb.cgi/src/usr.bin/nbperf/ > > and I concur with your judgment that we could manage translating > > that into Perl, especially if we only imple

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-07 Thread Tom Lane
I wrote: > I took a quick look through the NetBSD nbperf sources at > http://cvsweb.netbsd.org/bsdweb.cgi/src/usr.bin/nbperf/ > and I concur with your judgment that we could manage translating > that into Perl, especially if we only implement the parts we need. Here's an implementation of that, us

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-07 Thread John Naylor
On 1/6/19, Tom Lane wrote: > I've pushed that version (v8 + max_kw_len); if the buildfarm doesn't > fall over, we can move on with looking at hashing. Thank you. The API adjustment looks good, and I'm glad that splitting out the aux info led to even further cleanups. -John Naylor

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-06 Thread Tom Lane
Joerg Sonnenberger writes: > On Sun, Jan 06, 2019 at 02:29:05PM -0500, Tom Lane wrote: >> * We should extend the ScanKeywordList representation to include a >> field holding the longest keyword length in the table, which >> gen_keywordlist.pl would have no trouble providing. Then we could >> skip

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-06 Thread Tom Lane
Joerg Sonnenberger writes: > On Sun, Jan 06, 2019 at 02:29:05PM -0500, Tom Lane wrote: >> So we probably can't have inlined hashing code --- I imagine the >> hash generator needs the flexibility to pick different values of >> those multipliers. > Right now, only the initial values are randomized.

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-06 Thread Joerg Sonnenberger
On Sun, Jan 06, 2019 at 02:29:05PM -0500, Tom Lane wrote: > * It's too bad that the hash function doesn't have a return convention > that allows distinguishing "couldn't possibly match any keyword" from > "might match keyword 0". I imagine a lot of the zero entries in its > hashtable could be inte

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-06 Thread Tom Lane
Joerg Sonnenberger writes: > On Mon, Jan 07, 2019 at 03:11:55AM +1300, David Rowley wrote: >> What I'm most interested in is how long it took to generate the hash >> function in hash2.c? > It's within the noise floor of time(1) on my laptop, e.g. ~1ms. I decided to do some simple performance mea

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-06 Thread Joerg Sonnenberger
On Mon, Jan 07, 2019 at 03:11:55AM +1300, David Rowley wrote: > What I'm most interested in is how long it took to generate the hash > function in hash2.c? It's within the noise floor of time(1) on my laptop, e.g. ~1ms. Joerg

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-06 Thread David Rowley
On Sat, 5 Jan 2019 at 09:20, John Naylor wrote: > > On 1/3/19, Joerg Sonnenberger wrote: > > Hello John, > > I was pointed at your patch on IRC and decided to look into adding my > > own pieces. What I can provide you is a fast perfect hash function > > generator. I've attached a sample hash fun

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-05 Thread Tom Lane
I wrote: > I spent some time hacking on this today, and I think it's committable > now, but I'm putting it back up in case anyone wants to have another > look (and also so the cfbot can check it on Windows). ... and indeed, the cfbot doesn't like it. Here's v8, with the missing addition to Mkvcbu

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-05 Thread Tom Lane
John Naylor writes: > [ v6-0001-Use-offset-based-keyword-lookup.patch ] I spent some time hacking on this today, and I think it's committable now, but I'm putting it back up in case anyone wants to have another look (and also so the cfbot can check it on Windows). Given the discussion about poss

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-04 Thread Joerg Sonnenberger
On Fri, Jan 04, 2019 at 02:36:15PM -0800, Andres Freund wrote: > Hi, > > On 2019-01-04 16:43:39 -0500, Tom Lane wrote: > > Joerg Sonnenberger writes: > > >> * What's the generator written in? (if the answer's not "Perl", wedging > > >> it into our build processes might be painful) > > > > > Plai

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-04 Thread Andres Freund
Hi, On 2019-01-04 16:43:39 -0500, Tom Lane wrote: > Joerg Sonnenberger writes: > >> * What's the generator written in? (if the answer's not "Perl", wedging > >> it into our build processes might be painful) > > > Plain C, nothing really fancy in it. > > That's actually a bigger problem than you

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-04 Thread Tom Lane
Joerg Sonnenberger writes: > On Fri, Jan 04, 2019 at 03:31:11PM -0500, Tom Lane wrote: >> The sample hash function certainly looks great. I'm not terribly on board >> with using |0x20 as a substitute for lower-casing, but that's a minor >> detail. > Yeah, I've included that part more because I d

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-04 Thread Joerg Sonnenberger
On Fri, Jan 04, 2019 at 03:31:11PM -0500, Tom Lane wrote: > John Naylor writes: > > On 1/3/19, Joerg Sonnenberger wrote: > >> I was pointed at your patch on IRC and decided to look into adding my > >> own pieces. What I can provide you is a fast perfect hash function > >> generator. I've attache

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-04 Thread John Naylor
On 12/27/18, Tom Lane wrote: > If you really are hot about saving that other 440 bytes, the way to > do it would be to drop the struct entirely and use two parallel > arrays, an int16[] for value and a char[] (or better uint8[]) for > category. Those would be filled by reading kwlist.h twice with

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-04 Thread Tom Lane
Andres Freund writes: > On 2019-01-04 12:26:18 -0500, Tom Lane wrote: >> I'm wondering where we want to go with this. Is anyone excited >> about pursuing the perfect-hash-function idea? (Joerg's example >> function looked pretty neat to me.) If we are going to do that, >> does it make sense to

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-04 Thread Tom Lane
John Naylor writes: > On 1/3/19, Joerg Sonnenberger wrote: >> I was pointed at your patch on IRC and decided to look into adding my >> own pieces. What I can provide you is a fast perfect hash function >> generator. I've attached a sample hash function based on the current >> main keyword list.

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-04 Thread Andres Freund
On 2019-01-04 12:26:18 -0500, Tom Lane wrote: > I wrote: > > Andres Freund writes: > >> On 2018-12-29 16:59:52 -0500, John Naylor wrote: > >>> I think 0001 with complete keyword lookup replacement is in decent > >>> enough shape to post. Make check-world passes. A few notes and > >>> caveats: > >

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-04 Thread John Naylor
On 1/3/19, Joerg Sonnenberger wrote: > Hello John, > I was pointed at your patch on IRC and decided to look into adding my > own pieces. What I can provide you is a fast perfect hash function > generator. I've attached a sample hash function based on the current > main keyword list. hash() essent

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-04 Thread John Naylor
On 12/30/18, Andres Freund wrote: > I tried to take this for a spin, an for me the build fails because various > frontend programs don't have KeywordOffsets/Strings defined, but reference > it > through various functions exposed to the frontend (like fmtId()). That I > see > that error but you do

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-04 Thread Tom Lane
I wrote: > Andres Freund writes: >> On 2018-12-29 16:59:52 -0500, John Naylor wrote: >>> I think 0001 with complete keyword lookup replacement is in decent >>> enough shape to post. Make check-world passes. A few notes and >>> caveats: >> I tried to take this for a spin, an for me the build fails

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-03 Thread Joerg Sonnenberger
On Sun, Dec 16, 2018 at 11:50:15AM -0500, John Naylor wrote: > A few months ago I was looking into faster search algorithms for > ScanKeywordLookup(), so this is interesting to me. While an optimal > full replacement would be a lot of work, the above ideas are much less > invasive and would still h

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-30 Thread Tom Lane
Andres Freund writes: > On 2018-12-29 16:59:52 -0500, John Naylor wrote: >> I think 0001 with complete keyword lookup replacement is in decent >> enough shape to post. Make check-world passes. A few notes and >> caveats: > I tried to take this for a spin, an for me the build fails because various

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-30 Thread Andres Freund
Hi, On 2018-12-29 16:59:52 -0500, John Naylor wrote: > I think 0001 with complete keyword lookup replacement is in decent > enough shape to post. Make check-world passes. A few notes and > caveats: I tried to take this for a spin, an for me the build fails because various frontend programs don't

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-29 Thread John Naylor
I think 0001 with complete keyword lookup replacement is in decent enough shape to post. Make check-world passes. A few notes and caveats: -I added an --extern option to the script for the core keyword headers. This also capitalizes variables. -ECPG keyword lookup is a bit different in that the ec

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-28 Thread Andres Freund
On 2018-12-27 14:22:11 -0500, Tom Lane wrote: > Andrew Dunstan writes: > > A smaller project might be to see if we can replace the binary keyword > > search in ScanKeyword with a perfect hashing function generated by > > gperf, or something similar. I had a quick look at that, too. > > Yeah, we'v

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-27 Thread David Fetter
On Thu, Dec 27, 2018 at 07:04:41PM -0300, Alvaro Herrera wrote: > On 2018-Dec-27, Tom Lane wrote: > > > I poked around a little on my own machines, and I can confirm that > > Getopt::Long is present in a default Perl install-from-source at > > least as far back as perl 5.6.1. It's barely conceiva

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-27 Thread Alvaro Herrera
On 2018-Dec-27, Tom Lane wrote: > I poked around a little on my own machines, and I can confirm that > Getopt::Long is present in a default Perl install-from-source at > least as far back as perl 5.6.1. It's barely conceivable that some > packager might omit it from their minimal package, but Red

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-27 Thread Tom Lane
Andrew Dunstan writes: > On 12/27/18 3:34 PM, Tom Lane wrote: >> ... that's a pretty thin argument, and if Getopt::Long is present even >> in the most minimal Perl installations then it's certainly moot. > It's bundled separately, but on both systems I looked at it's needed by > the base perl pac

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-27 Thread Andrew Dunstan
On 12/27/18 3:34 PM, Tom Lane wrote: > Andrew Dunstan writes: >> On 12/27/18 3:00 PM, John Naylor wrote: >>> This style was cargo-culted from the catalog scripts. I can settle on >>> just the first form if you like. >> I would rather we used the standard perl module Getopt::Long, as >> numerous

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-27 Thread Tom Lane
Andrew Dunstan writes: > On 12/27/18 3:00 PM, John Naylor wrote: >> This style was cargo-culted from the catalog scripts. I can settle on >> just the first form if you like. > I would rather we used the standard perl module Getopt::Long, as > numerous programs we have already do. Hmm ... greppin

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-27 Thread Andrew Dunstan
On 12/27/18 3:00 PM, John Naylor wrote: > On 12/27/18, Tom Lane wrote: >> diff --git a/src/tools/gen_keywords.pl b/src/tools/gen_keywords.pl >> +elsif ($arg =~ /^-o/) >> +{ >> +$output_path = length($arg) > 2 ? substr($arg, 2) : shift @ARGV; >> +} >> >> My perl-fu is not

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-27 Thread Tom Lane
John Naylor writes: > On 12/27/18, Tom Lane wrote: >> +$kw_input_file =~ /((\w*)kwlist)\.h/; >> +my $base_filename = $1; >> +$prefix = $2 if !defined $prefix; >> >> Hmm, what happens if the input filename does not end with "kwlist.h"? > If that's a maintainability hazard, I can force every invo

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-27 Thread John Naylor
On 12/27/18, Tom Lane wrote: > diff --git a/src/tools/gen_keywords.pl b/src/tools/gen_keywords.pl > + elsif ($arg =~ /^-o/) > + { > + $output_path = length($arg) > 2 ? substr($arg, 2) : shift @ARGV; > + } > > My perl-fu is not great, but it looks like this will accept argum

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-27 Thread Tom Lane
Andrew Dunstan writes: > RD parsers are not terribly hard to write. Sure, as long as they are for grammars that are (a) small, (b) static, and (c) LL(1), which is strictly weaker than the LALR(1) grammar class that bison can handle. We already have a whole lot of constructs that are at the edges

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-27 Thread Tom Lane
John Naylor writes: > Barring additional bikeshedding on 0001, I'll plan on implementing > offset-based lookup for the other keyword types and retire the old > ScanKeyword. Once that's done, we can benchmark and compare with the > optimizations in 0002. Sounds like a plan. Assorted minor bikeshe

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-27 Thread Andrew Dunstan
On 12/27/18 12:12 PM, Tom Lane wrote: >> I don't buy that we're inable to write a descent parser that way. > I do not think that we could write one for the current state of the > PG grammar without an investment of effort so large that it's not > going to happen. Even if such a parser were to s

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-27 Thread Tom Lane
Andres Freund writes: > On 2018-12-26 14:03:57 -0500, Tom Lane wrote: >> It's impossible to write correct RD parsers by hand for any but the most >> trivial, conflict-free languages, and what we have got to deal with >> is certainly neither of those; moreover, it's a constantly moving target. >> W

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-27 Thread John Naylor
On 12/26/18, John Naylor wrote: > On 12/26/18, Robert Haas wrote: >> I wonder if we could do something really simple like a lookup based on >> the first character of the scan keyword. It looks to me like there are >> 440 keywords right now, and the most common starting letter is 'c', >> which is

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-27 Thread Andres Freund
Hi, On 2018-12-26 14:03:57 -0500, Tom Lane wrote: > Andres Freund writes: > > My bet is, and has been for quite a while, that we'll have to go for a > > hand-written recursive descent type parser. > > I will state right up front that that will happen over my dead body. > > It's impossible to wr

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-26 Thread John Naylor
On 12/26/18, Robert Haas wrote: > I wonder if we could do something really simple like a lookup based on > the first character of the scan keyword. It looks to me like there are > 440 keywords right now, and the most common starting letter is 'c', > which is the first letter of 51 keywords. So dis

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-26 Thread Tom Lane
Andres Freund writes: > On 2018-12-26 10:45:11 -0500, Robert Haas wrote: >> I'm not sure that I understand quite what you have in mind for a >> serialized non-perfect hashtable. Are you thinking that we'd just >> construct a simplehash and serialize it? > I was basically thinking that we'd have

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-26 Thread Tom Lane
Andres Freund writes: > My bet is, and has been for quite a while, that we'll have to go for a > hand-written recursive descent type parser. I will state right up front that that will happen over my dead body. It's impossible to write correct RD parsers by hand for any but the most trivial, conf

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-26 Thread Andres Freund
Hi, On 2018-12-26 10:45:11 -0500, Robert Haas wrote: > I'm not sure that I understand quite what you have in mind for a > serialized non-perfect hashtable. Are you thinking that we'd just > construct a simplehash and serialize it? I was basically thinking that we'd have the perl script implement

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-26 Thread Andres Freund
Hi, On 2018-12-26 11:50:18 -0500, Robert Haas wrote: > On Wed, Dec 26, 2018 at 11:22 AM Tom Lane wrote: > > I think there's a lot of goalpost-moving going on here. The original > > idea was to trim the physical size of the data structure, as stated > > in the thread subject, and just reap whatev

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-26 Thread Tom Lane
Robert Haas writes: > I'm kinda surprised that you haven't seen ScanKeywordLookup() in > there, but I agree with you that the size of the main parser tables is > a real issue, and that there's no easy solution. At various times > there has been discussion of using some other parser generator, and

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-26 Thread Tom Lane
David Fetter writes: > On Wed, Dec 26, 2018 at 11:22:39AM -0500, Tom Lane wrote: >> In my hands, the only part of the low-level parsing code that >> commonly shows up as interesting in profiles is the Bison engine. > Should we be considering others? We've looked around before, IIRC, and not real

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-26 Thread Robert Haas
On Wed, Dec 26, 2018 at 11:22 AM Tom Lane wrote: > I think there's a lot of goalpost-moving going on here. The original > idea was to trim the physical size of the data structure, as stated > in the thread subject, and just reap whatever cache benefits we got > along the way from that. I am dubi

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-26 Thread David Fetter
On Wed, Dec 26, 2018 at 11:22:39AM -0500, Tom Lane wrote: > > In my hands, the only part of the low-level parsing code that > commonly shows up as interesting in profiles is the Bison engine. Should we be considering others? As I understand it, steps have been made in this field since yacc was or

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-26 Thread Tom Lane
Robert Haas writes: > On Wed, Dec 19, 2018 at 8:01 PM Andres Freund wrote: >> The last time I looked into perfect hash functions, it wasn't easy to >> find a generator that competed with a decent normal hashtable (in >> particular gperf's are very unconvincing). The added tooling is a >> concern

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-26 Thread Robert Haas
On Wed, Dec 19, 2018 at 8:01 PM Andres Freund wrote: > The last time I looked into perfect hash functions, it wasn't easy to > find a generator that competed with a decent normal hashtable (in > particular gperf's are very unconvincing). The added tooling is a > concern imo. OTOH, we're comparing

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-23 Thread John Naylor
On 12/22/18, Tom Lane wrote: > John Naylor writes: >> Using a single file also gave me another idea: Take value and category >> out of ScanKeyword, and replace them with an index into another array >> containing those, which will only be accessed in the event of a hit. >> That would shrink ScanKe

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-22 Thread Tom Lane
Andres Freund writes: > On 2018-12-22 12:20:00 -0500, Tom Lane wrote: >> I like that idea a *lot*, actually, because it offers the opportunity >> to decouple this mechanism from all assumptions about what the >> auxiliary data for a keyword is. > OTOH, it doubles or triples the number of cachelin

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-22 Thread Andres Freund
Hi, On 2018-12-22 12:20:00 -0500, Tom Lane wrote: > John Naylor writes: > > Using a single file also gave me another idea: Take value and category > > out of ScanKeyword, and replace them with an index into another array > > containing those, which will only be accessed in the event of a hit. > >

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-22 Thread Tom Lane
John Naylor writes: > Using a single file also gave me another idea: Take value and category > out of ScanKeyword, and replace them with an index into another array > containing those, which will only be accessed in the event of a hit. > That would shrink ScanKeyword to 4 bytes (offset, index), fu

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-21 Thread John Naylor
On 12/20/18, Tom Lane wrote: > I'd be inclined to put the script in src/tools, I think. IMO src/common > is for code that actually gets built into our executables. Done. >> which takes >> pl_unreserved_kwlist.h as input and outputs >> pl_unreserved_kwlist_offset.h and pl_unreserved_kwlist_strin

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-20 Thread Tom Lane
John Naylor writes: > On 12/18/18, Tom Lane wrote: >> I'd be kind of inclined to convert all uses of ScanKeyword to the new way, >> if only for consistency's sake. On the other hand, I'm not the one >> volunteering to do the work. > That's reasonable, as long as the design is nailed down first.

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-19 Thread John Naylor
On 12/19/18, Andrew Gierth wrote: > Is there any particular reason not to go further and use a perfect hash > function for the lookup, rather than binary search? When I was investigating faster algorithms, I ruled out gperf based on discussions in the archives. The approach here has modest goals

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-19 Thread Tom Lane
Andrew Gierth writes: > Is there any particular reason not to go further and use a perfect hash > function for the lookup, rather than binary search? Tooling? I seem to recall having looked at gperf and deciding that it pretty much sucked, so it's not real clear to me what we would use.

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-19 Thread Andres Freund
Hi, On 2018-12-20 00:54:39 +, Andrew Gierth wrote: > > "John" == John Naylor writes: > > > On 12/18/18, Tom Lane wrote: > >> I'd be kind of inclined to convert all uses of ScanKeyword to the > >> new way, if only for consistency's sake. On the other hand, I'm not > >> the one volunt

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-19 Thread Andrew Gierth
> "John" == John Naylor writes: > On 12/18/18, Tom Lane wrote: >> I'd be kind of inclined to convert all uses of ScanKeyword to the >> new way, if only for consistency's sake. On the other hand, I'm not >> the one volunteering to do the work. John> That's reasonable, as long as the des

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-19 Thread John Naylor
On 12/18/18, Tom Lane wrote: > I'd be kind of inclined to convert all uses of ScanKeyword to the new way, > if only for consistency's sake. On the other hand, I'm not the one > volunteering to do the work. That's reasonable, as long as the design is nailed down first. Along those lines, attached

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-18 Thread Tom Lane
John Naylor writes: > On 12/17/18, Tom Lane wrote: >> Also, wouldn't we also adopt this technology for its unreserved keywords, >> too? > We wouldn't be forced to, but there might be other reasons to do so. > Were you thinking of code consistency (within pl_scanner.c or > globally)? Or something

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-18 Thread John Naylor
On 12/17/18, Tom Lane wrote: > John Naylor writes: >> Since PL/pgSQL uses the core scanner, we'd need to use offsets in its >> reserved_keywords[], too. Those don't change much, so we can probably >> get away with hard-coding the offsets and the giant string in that >> case. (If that's not accept

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-17 Thread Tom Lane
John Naylor writes: > A few months ago I was looking into faster search algorithms for > ScanKeywordLookup(), so this is interesting to me. While an optimal > full replacement would be a lot of work, the above ideas are much less > invasive and would still have some benefit. Unless anyone intends

reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-16 Thread John Naylor
On 10/15/18, Tom Lane wrote: > Andres Freund writes: >> On 2018-10-15 16:36:26 -0400, Tom Lane wrote: >>> We could possibly fix these by changing the data structure so that >>> what's in a ScanKeywords entry is an offset into some giant string >>> constant somewhere. No idea how that would affec