John Naylor <john.nay...@2ndquadrant.com> writes: > 0001 is a small patch to remove some unneeded generality from the > current rules. This lowers the number of elements in the yy_transition > array from 37045 to 36201.
I don't particularly like 0001. The two bits like this -whitespace ({space}+|{comment}) +whitespace ({space}|{comment}) seem likely to create performance problems for runs of whitespace, in that the lexer will now have to execute the associated action once per space character not just once for the whole run. Those actions are empty, but I don't think flex optimizes for that, and it's really flex's per-action overhead that I'm worried about. Note the comment in the "Performance" section of the flex manual: Another area where the user can increase a scanner's performance (and one that's easier to implement) arises from the fact that the longer the tokens matched, the faster the scanner will run. This is because with long tokens the processing of most input characters takes place in the (short) inner scanning loop, and does not often have to go through the additional work of setting up the scanning environment (e.g., `yytext') for the action. There are a bunch of higher-order productions that use "{whitespace}*", which is surely a bit redundant given the contents of {whitespace}. But maybe we could address that by replacing "{whitespace}*" with "{opt_whitespace}" defined as opt_whitespace ({space}*|{comment}) Not sure what impact if any that'd have on table size, but I'm quite sure that {whitespace} was defined with an eye to avoiding unnecessary lexer action cycles. As for the other two bits that are like -<xe>. { - /* This is only needed for \ just before EOF */ +<xe>\\ { my recollection is that those productions are defined that way to avoid a flex warning about not all possible input characters being accounted for in the <xe> (resp. <xdolq>) state. Maybe that warning is flex-version-dependent, or maybe this was just a worry and not something that actually produced a warning ... but I'm hesitant to change it. If we ever did get to flex's default action, that action is to echo the current input character to stdout, which would be Very Bad. As far as I can see, the point of 0002 is to have just one set of flex rules for the various variants of quotecontinue processing. That sounds OK, though I'm a bit surprised it makes this much difference in the table size. I would suggest that "state_before" needs a less generic name (maybe "state_before_xqs"?) and more than no comment. Possibly more to the point, it's not okay to have static state variables in the core scanner, so that variable needs to be kept in yyextra. (Don't remember offhand whether it's any more acceptable in the other scanners.) regards, tom lane