On Wed, Dec 26, 2018 at 11:22 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > I think there's a lot of goalpost-moving going on here. The original > idea was to trim the physical size of the data structure, as stated > in the thread subject, and just reap whatever cache benefits we got > along the way from that. I am dubious that we actually have any > performance problem in this code that needs a big dollop of added > complexity to fix.
I have seen ScanKeywordLookup show up in profiles quite often and fairly prominently -- like several percent of total runtime. I'm not trying to impose requirements on John's patch, and I agree that reducing the physical size of the structure is a good step whether anything else is done or not. However, I don't see that as a reason to shut down further discussion of other possible improvements. If his patch makes this disappear from profiles, cool, but if it doesn't, then sooner or later somebody's going to want to do more. FWIW, my bet is this helps but isn't enough to get rid of the problem completely. A 9-step binary search has got to be slower than a really well-optimized hash table lookup. In a perfect world the latter touches the cache line containing the keyword -- which presumably is already in cache since we just scanned it -- then computes a hash value without touching any other cache lines -- and then goes straight to the right entry. So it touches ONE new cache line. That might a level of optimization that's hard to achieve in practice, but I don't think it's crazy to want to get there. > In my hands, the only part of the low-level parsing code that commonly > shows up as interesting in profiles is the Bison engine. That's probably > because the grammar tables are circa half a megabyte and blow out cache > pretty badly :-(. I don't know of any way to make that better, > unfortunately. I suspect that it's just going to get worse, because > people keep submitting additions to the grammar. I'm kinda surprised that you haven't seen ScanKeywordLookup() in there, but I agree with you that the size of the main parser tables is a real issue, and that there's no easy solution. At various times there has been discussion of using some other parser generator, and I've also toyed with the idea of writing one specifically for PostgreSQL. Unfortunately, it seems like bison is all but unmaintained; the alternatives are immature and have limited adoption and limited community; and writing something from scratch is a ton of work. :-( -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company