John Naylor <john.nay...@2ndquadrant.com> writes: > Traditionally, when Bison generates the header file, it starts > numbering named tokens at 258, so that numbers below 256 can be used > for character literals. Then, during parsing, the external token > number or character literal is mapped to Bison's internal token number > via the yytranslate[] array. > The newly-released Bison 3.5 has the option "%define api.token.raw", > which causes Bison to write out the same ("raw") token numbers it > would use internally, and thus skip building the yytranslate[] array > as well as the code to access it. To make use of this, there cannot be > any single character literals in the grammar, otherwise Bison will > refuse to build. > Attached is a draft patch to make the core grammar forward-compatible > with this option by using FULL_NAMES for all valid single character > tokens. Benchmarking raw parsing with "%define api.token.raw" enabled > shows ~1.7-1.9% improvement compared with not setting the option. Not > much, but doing one less array access per token reduces cache > pollution and saves a few kB of binary size as well.
TBH, I'm having a hard time getting excited about this. It seems like you've just moved the mapping from point A to point B, that is, in place of a lookup in the grammar you have to have the lexer translate ASCII characters to something else. I'm not sure that's an improvement at all. And I'm really unexcited about applying a patch that's this invasive in order to chase a very small improvement ... especially a very small improvement that we can't even have anytime soon. > It'll be years before Bison 3.5 is common in the wild, It'll be *decades* before we'd consider requiring it, really, unless there are truly striking improvements unrelated to this point. regards, tom lane