On Sat, Jan 21, 2012 at 09:34:26AM -0800, Carl Worth wrote: > The trick here is that flex always chooses the rule that matches the most > text. So with a input text of "two:" which we want to be lexed as an > IDENTIFIER token "two" followed by an OTHER token ":" the previous OTHER > rule would match longer as a single token of "two:" which we don't want. > > We prevent this by forcing the OTHER pattern to never match a string which > has a first character that could be an identifier (that is _, a-z, or A-Z). > This way the ambiguity is eliminated and this case is lexed correctly. > --- > src/glsl/glcpp/glcpp-lex.l | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/src/glsl/glcpp/glcpp-lex.l b/src/glsl/glcpp/glcpp-lex.l > index 8661887..20d7fc1 100644 > --- a/src/glsl/glcpp/glcpp-lex.l > +++ b/src/glsl/glcpp/glcpp-lex.l > @@ -70,7 +70,7 @@ HSPACE [ \t] > HASH ^{HSPACE}*#{HSPACE}* > IDENTIFIER [_a-zA-Z][_a-zA-Z0-9]* > PUNCTUATION [][(){}.&*~!/%<>^|;,=+-] > -OTHER [^][(){}.&*~!/%<>^|;,=#[:space:]+-]+ > +OTHER > [^][_a-zA-Z(){}.&*~!/%<>^|;,=#[:space:]+-][^][(){}.&*~!/%<>^|;,=#[:space:]+-]*
The fix seems correct, but reading that regex made my head hurt. Is there a way to refactor the character classes to be more readable? OTHER is basically [{PUNCTUATION}{IDENTIFIER_FIRST_CHAR}[:space:]][{PUNCTUATION}[:space:]]*, right? > DIGITS [0-9][0-9]* > DECIMAL_INTEGER [1-9][0-9]*[uU]? > -- > 1.7.8.3 > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
pgpe1Ykoxf9T0.pgp
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev