On 10/15/18, Tom Lane <t...@sss.pgh.pa.us> wrote: > Andres Freund <and...@anarazel.de> writes: >> On 2018-10-15 16:36:26 -0400, Tom Lane wrote: >>> We could possibly fix these by changing the data structure so that >>> what's in a ScanKeywords entry is an offset into some giant string >>> constant somewhere. No idea how that would affect performance, but >>> I do notice that we could reduce the sizeof(ScanKeyword), which can't >>> hurt. > >> Yea, that might even help performancewise. Alternatively we could change >> ScanKeyword to store the keyword name inline, but that'd be a measurable >> size increase... > > Yeah. It also seems like doing it this way would improve locality of > access: the pieces of the giant string would presumably be in the same > order as the ScanKeywords entries, whereas with the current setup, > who knows where the compiler has put 'em or in what order. > > We'd need some tooling to generate the constants that way, though; > I can't see how to make it directly from kwlist.h.
A few months ago I was looking into faster search algorithms for ScanKeywordLookup(), so this is interesting to me. While an optimal full replacement would be a lot of work, the above ideas are much less invasive and would still have some benefit. Unless anyone intends to work on this, I'd like to flesh out the offset-into-giant-string approach a bit further: Since there are several callers of the current approach that don't use the core keyword list, we'd have to keep the existing struct and lookup function, to keep the complexity manageable. Once we have an offset-based struct and function, it makes sense to use it for all searches of core keywords. This includes not only the core scanner, but also adt/rule_utils.c, fe_utils/string_utils.c, and ecpg/preproc/keywords.c. There would need to be a header with offsets replacing name strings, generated from parser/kwlist.h, maybe kwlist_offset.h. It'd probably be convenient if it was emitted into the common/ dir. The giant string would likely need its own header (kwlist_string.h?). Since PL/pgSQL uses the core scanner, we'd need to use offsets in its reserved_keywords[], too. Those don't change much, so we can probably get away with hard-coding the offsets and the giant string in that case. (If that's not acceptable, we could separate that out to pl_reserved_kwlist.h and reuse the above tooling to generate pl_reserved_kwlist_{offset,string}.h, but that's more complex.) The rest should be just a SMOP. Any issues I left out? -John Naylor