On Thu, 31 May 2007, Andi Kleen wrote: > "Frank Schaefer" <[EMAIL PROTECTED]> writes: > > > > Is there any interest in using such an engine in the GCC toolset? > > Right now gcc doesn't use flex so it would be probably non trivial > to implement support. You would need to rewrite c-lex.c
All the relevant code is in cpplib, not c-lex.c. Zack had some ideas a few years ago (I don't think they were ever posted to a public list) about how to speed up _cpp_clean_line in particular, and some or all of translation phases 1 to 3 in general. The idea is that you have several Mealy machines (state machines where all the work happens in transitions), where edges apply for a given set of input characters in a given state and describe the actions to be taken in that case. Actions include both passing output to another machine, and emitting diagnostics. So you start with converting character sets to UTF-8, then strip trailing whitespace and canonicalise newlines, then convert trigraphs, then remove backslash-newline pairs, then strip comments, then split the file into preprocessing tokens. A state machine that does all these things at once is too complicated to write and verify by hand; the idea involves descriptions of the state machine for each subphase, which would be automatically combined into a single machine and optimized. For example, inside comments the state machine doesn't need to convert character sets for most well-behaved character sets (those where an ASCII comment end is always one in the character set and vice versa - for example all those where extended characters are made up entirely of bytes with the high bit set and other bytes have their ASCII values), and doesn't need to convert trigraphs or backslash-newlines except in places where ??/ or \ followed by newline might split a multi-line */. Furthermore, the generation process should generate a separate state machine for each mode in which the compiler can operate that affects lexing - machines with trigraphs enabled should be separate from those with trigraphs disabled, so there should be no checks of cpplib configuration flags in inner loops; just the state and next character. (If it turns out we have too many option combinations, we could have fast machines for the most common combinations and slower ones that check flags at runtime for the less common ones.) -- Joseph S. Myers [EMAIL PROTECTED]