On Sun, 09 Nov 2008 15:53:01 +0100, Thomas Mlynarczyk wrote: > Arnaud Delobelle schrieb: > >> Adding to John's comments, I wouldn't have source as a member of the >> Lexer object but as an argument of the tokenise() method (which I would >> make public). The tokenise method would return what you currently call >> self.result. So it would be used like this. > >>>>> mylexer = Lexer(tokens) >>>>> mylexer.tokenise(source) >>>>> mylexer.tokenise(another_source) > > At a later stage, I intend to have the source tokenised not all at once, > but token by token, "just in time" when the parser (yet to be written) > accesses the next token:
You don't have to introduce a `next` method to your Lexer class. You could just transform your `tokenize` method into a generator by replacing ``self.result.append`` with `yield`. It gives you the just in time part for free while not picking your algorithm into tiny unrelated pieces. > token = mylexer.next( 'FOO_TOKEN' ) > if not token: raise Exception( 'FOO token expected.' ) # continue > doing something useful with token > > Where next() would return the next token (and advance an internal > pointer) *if* it is a FOO_TOKEN, otherwise it would return False. This > way, the total number of regex matchings would be reduced: Only that > which is expected is "tried out". Python generators recently (2.5) grew a `send` method. You could use `next` for unconditional tokenization and ``mytokenizer.send("expected token")`` whenever you expect a special token. See http://www.python.org/dev/peps/pep-0342/ for details. HTH, -- Robert "Stargaming" Lehmann -- http://mail.python.org/mailman/listinfo/python-list