Christian, Levin, Akim and all,
Keep in mind that most of the CPU time is usually spent on lexer side. So it
would make sense to optimize that first. However that's one of the issues
where you quickly realize that a split parser/lexer design is in the way. If
they were tied together, you could make much more aggressive optimizations by
having knowledge of both grammar and RegEx patterns.
For instance right now, whenever you have some input, the lexer must always
assume that any of the patterns may match. Whereas a tied grammar/lexer would
have the knowledge that for a certain parser state, only a small set of
patterns may match, which in turn would allow much more compact and faster
lexer branches for those states to be generated.
The point that Akim is making is that this would yield a combinatorial
expansion (explosion?) of the state space. Whether done in tables or
code, you would lose enough locality that cache and TLB misses would
likely become an issue -- at least on realistically large grammars.
Then consider that lexers don't just recognize an identifier, they
usually construct an object to hold it (or look one up in a hash
table), storing the pointer, e.g., into yylval. How many copies
of this Flex "action code" do you want? Hint: you had better
make it be a function with carefully-defined calling convention, or
the code bloat will be fierce. Better yet, compose a separate
function for each lexer action automagically. Flex naturally gives
you just one copy of this code.
Also remember that the tables used by many lexers perform multi-way
branches on each character, consolidating many potential pipeline
flushes into one. The choice of whether to code these as
if-then-else trees or switch-statement branch tables is a complex
trade off between code size and pipeline flushes. And there is
the question of whether branch-prediction logic can maintain a low
misprediction rate in the face of such a large flow graph containing
a huge number of cyclic paths, few of which offer any extended
number of iterations (e.g., when parsing the initializers for
Bison parse tables! :^). As Akim said, it really isn't clear how
well this approach would pay off on modern architectures.
This could also really hurt the total code size on systems like
the one I work on that have 150+ grammars.
The good news is that GCC offers pretty much all of the tools
you might need to automatically render a Bison state machine
into "inline" C code:
- Define a "struct" containing all of the data structures
(state/value/location stack bases, pointer and sizes,
yylval, yylloc, error recovery stuff, etc.). This can be
passed to various functions with a single pointer argument.
The one instance lives on the stack frame of yyparse().
- Make a _yylex(ptr) wrapper function taking this one pointer,
that calls yylex(), and properly adapts its calling convention.
(For example, use the re-entrant yylex() conventions.)
- Using && to take the address of labels and "goto *pointer;"
could really help. Define one label to "shift" into a state,
a second label to "goto" that state after a reduction.
Only the second kind of label need appear on the state
stack. Both of these need multi-way dispatch: the "shift"
portion on tokens, the "goto" portion on non-terminals.
- Code the "shift" and "goto" dispatch of all states as
switch statements, letting GCC choose between branch tree
jump table, or combination. The "default:" for the
"shift" switch must always be to error recovery. The
"default:" for the "goto" switch could be the most
frequent destination state, whose separate cases are
then omitted. In states where the only action is a
single reduce, the "shift" switch could be replaced with
a single reduce action.
- Encapsulate the "action code" for each rule in a separate
function, since many of these would be called from many
places. This would be an excellent place to pop the
stack pointers, since each rule knows how many symbols
appear on its own right-hand-side (thereby eliminating
the yyr2 table). These functions should also return the
right-hand-side symbol reduced, becoming input when
entering the "goto" label of the popped state
(eliminating yyr1 table).
- Note that the numbering scheme for symbols returned
by these action functions are used only to perform the
proper "goto" function for the popped state. It is
entirely possible that a different numbering of the
various non-terminals might produce more contiguous
jump tables in each state's "goto" code. (Interesting
combinatorial optimization problem / open research
problem.)
- Some code (e.g., to reallocate the stacks when about to
overflow) should be coded as functions that can easily
be called from many places (every shift and goto).
I would start with just a Bison-level state machine before
trying to combine parser and lexer with such techniques.
Interesting to note that this might actually be much
easier to try out using very old Bison (without all the
m4 boilerplate) than on modern Bison.
Somebody should try this out and let us know what happens.
Cheers,
David Warme
On 11/6/21 9:13 AM, Christian Schoenebeck wrote:
On Samstag, 6. November 2021 09:44:55 CET Akim Demaille wrote:
Hi Levin,
Le 28 oct. 2021 à 08:24, elite liu<eligere....@gmail.com> a écrit :
Dear Akim,
I am writing a utility that convert a table-driven dfa into
condition/if-else form, for example:
[...]
I have always wondered if code won't be more efficient than tables.
Since compiler and cpu have learned a lot of tricks to make code
faster, I wouldn't be surprised to have some speedup.
If above codes are writen in tabled-driven implementation, it should be
more concise.
Sure. And compactness is also cache friendliness. So I agree
it is very much unclear which approach will beat the other one.
On today's machines.
Cheers.
The intended approach would still be a character by character procedure which
is much slower than allowing a lexer to perform string comparisons as much as
possible:
https://lemire.me/blog/2020/04/30/for-case-insensitive-string-comparisons-avoid-char-by-char-functions/
Keep in mind that most of the CPU time is usually spent on lexer side. So it
would make sense to optimize that first. However that's one of the issues
where you quickly realize that a split parser/lexer design is in the way. If
they were tied together, you could make much more aggressive optimizations by
having knowledge of both grammar and RegEx patterns.
For instance right now, whenever you have some input, the lexer must always
assume that any of the patterns may match. Whereas a tied grammar/lexer would
have the knowledge that for a certain parser state, only a small set of
patterns may match, which in turn would allow much more compact and faster
lexer branches for those states to be generated.
Best regards,
Christian Schoenebeck