OK that worked really well. In particular, the "lastindex" property of the match object can be used to tell exactly which group matched, without having to sequentially search the list of groups.
In fact, I was able to use your idea to cobble together a poor man's lexer which I am calling "reflex" (Regular Expressions For Lexing). Here's an example of how it's used: # Define the states using an enumeration State = Enum( 'Default', 'Comment', 'String' ) # Create a scanner scanner = reflex.scanner( State.Default ) scanner.rule( "\s+" ) scanner.rule( "/\*", reflex.set_state( State.Comment ) ) scanner.rule( "[a-zA-Z_][\w_]*", KeywordOrIdent ) scanner.rule( "0x[\da-fA-F]+|\d+", token=TokenType.Integer ) scanner.rule( "(?:\d+\.\d*|\.\d+)(?:[eE]?[+-]?\d+)|\d+[eE]?[+-]?\d+", token=TokenType.Real ) # Multi-line comment state scanner.state( State.Comment ) scanner.rule( "\*/", reflex.set_state( State.Default ) ) scanner.rule( "(?:[^*]|\*(?!/))+" ) # Now, create an instance of the scanner token_stream = scanner( input_file_iter ) for token in token_stream: print token Internally, it creates an array of patterns and actions for each state. Then when you ask it to create a scanner instance, it combines all of the patterns into a large regular expression. Input lines are marched vs. this regex, and if a match succeeds, then the match object's 'lastindenx' property is used to look up the actions to perform in the array. -- http://mail.python.org/mailman/listinfo/python-list