Steve Holden schrieb:
Suppose I use the dict and I want to access the regex associatetd with
the token named "tokenname" (that is, no iteration, but a single
access). I could simple write tokendict["tokenname"]. But with the list
of tuples, I can't think of an equally easy way to do that. But then, as
a beginner, I might be underestimating Python.
But when do you want to do that? There's no point inventing use cases -
they should be demonstrated needs.
Well, I had been thinking about further reducing the number of regex
matchings needed. So I wanted to modify my lexer not to tokenize the
whole input at once, but only try to grab the next token from the input
"just in time" / on demand. For that I was thinking of having a next()
method like this:
def next( self, nameOfExpectedToken ):
regex = self.getRegexByTokenName( nameOfExpectedToken )
match = regex.match( self.source, self.offset )
if not match: return False
line = self.line
self.line += match.group(0).count( "\n" )
self.offset += len( match.group(0) )
return ( nameOfExpectedToken, match, line )
I'm not sure if this is a good idea, but it looks like one to me. The
problem is the first line of the method which retrieves the regex
associated with the given token name. Using a dict, I could simply write
regex = self.tokendict[nameOfExpectedToken]
But with a list I suppose I wouldn't get away without a loop. Which I
assume is more expensive that the dict.
Or simply pass compiled token patterns in in the first place when they
are necessary ... then the caller has the option of not bothering to
optimize in the first place!
That would be an option. But shouldn't it be the lexer who takes care of
optimizing its own work as much as it can do without the caller's
assistance? After all, the caller should not need to know about the
internal workings of the lexer.
[Optimizing performance by putting most frequent tokens first]
With a dict you have no such opportunity, because the ordering is
determined by the implementation and not by your data structure.
True. Still, I should be able to gain even better performance with my
above approach using a next() function, as this would completely
eliminate all "useless" matching (like trying to match FOO where no foo
is allowed).
Greetings,
Thomas
--
Ce n'est pas parce qu'ils sont nombreux à avoir tort qu'ils ont raison!
(Coluche)
--
http://mail.python.org/mailman/listinfo/python-list