On Thu, 31 May 2007, Andi Kleen wrote:

> "Frank Schaefer" <[EMAIL PROTECTED]> writes:
> > 
> > Is there any interest in using such an engine in the GCC toolset? 
> 
> Right now gcc doesn't use flex so it would be probably non trivial
> to implement support. You would need to rewrite c-lex.c

All the relevant code is in cpplib, not c-lex.c.

Zack had some ideas a few years ago (I don't think they were ever posted 
to a public list) about how to speed up _cpp_clean_line in particular, and 
some or all of translation phases 1 to 3 in general.  The idea is that you 
have several Mealy machines (state machines where all the work happens in 
transitions), where edges apply for a given set of input characters in a 
given state and describe the actions to be taken in that case.  Actions 
include both passing output to another machine, and emitting diagnostics.  
So you start with converting character sets to UTF-8, then strip trailing 
whitespace and canonicalise newlines, then convert trigraphs, then remove 
backslash-newline pairs, then strip comments, then split the file into 
preprocessing tokens.

A state machine that does all these things at once is too complicated to 
write and verify by hand; the idea involves descriptions of the state 
machine for each subphase, which would be automatically combined into a 
single machine and optimized.  For example, inside comments the state 
machine doesn't need to convert character sets for most well-behaved 
character sets (those where an ASCII comment end is always one in the 
character set and vice versa - for example all those where extended 
characters are made up entirely of bytes with the high bit set and other 
bytes have their ASCII values), and doesn't need to convert trigraphs or 
backslash-newlines except in places where ??/ or \ followed by newline 
might split a multi-line */.

Furthermore, the generation process should generate a separate state 
machine for each mode in which the compiler can operate that affects 
lexing - machines with trigraphs enabled should be separate from those 
with trigraphs disabled, so there should be no checks of cpplib 
configuration flags in inner loops; just the state and next character.  
(If it turns out we have too many option combinations, we could have fast 
machines for the most common combinations and slower ones that check flags 
at runtime for the less common ones.)

-- 
Joseph S. Myers
[EMAIL PROTECTED]

Reply via email to