On Nov 18, 8:24 pm, greg <[EMAIL PROTECTED]> wrote: > Tor Erik Sønvisen wrote: > > Comments, speedups, improvements in general, etc are appreciated. > > You're doing a lot of repeated indexing of token[0] > and token[1] in your elif branches. You might gain some > speed by fetching these into locals before entering the > elif chain. > > Also you could try ordering the branches so that the > most frequent cases come first. Probably strings and > numbers first, then the various kinds of bracket. > This would also give you a chance to avoid pulling out > token[1] until you need it. > > token[1].startswith('u'): It's probably faster to > use an index to get the first character, if you know > that the string is not empty.
I tried several of these micro optimizations but there was very little improvement; eval() remains practically 5 times faster. The major bottleneck is generate_tokens(); replacing simple_eval() with the following is still 3 times slower than eval(): def simple_eval(source): for _ in generate_tokens(StringIO(source).readline): pass That's not very surprising since generate_tokens() is quite general and yields more information than necessary. Clearly if performance is critical you should write your own simple_generate_tokens(), possibly as a cut down version of the generic one. Leaving performance aside, below is a slightly more compact version. The almost identical code for handling lists and tuples is factored out in _iter_sequence(). The 'token' parameter here is the actual token, not the 5-tuple yielded by generate_tokens(). Finally this version handles negative and long numbers (which the original didn't): from string import digits from cStringIO import StringIO from tokenize import generate_tokens, NL _consts = {'None': None, 'False': False, 'True': True} def simple_eval(source): itertokens = generate_tokens(StringIO(source).readline) next = (token[1] for token in itertokens if token[0] is not NL).next res = atom(next, next()) if next(): raise SyntaxError("bogus data after expression") return res def atom(next, token): def _iter_sequence(end): token = next() while token != end: yield atom(next, token) token = next() if token == ',': token = next() firstchar = token[0] if token in _consts: return _consts[token] elif token[-1] == 'L': return long(token) elif firstchar in digits: return float(token) if '.' in token else int(token) elif firstchar in '"\'': return token[1:-1].decode('string-escape') elif firstchar == 'u': return token[2:-1].decode('unicode-escape') elif token == '-': return -atom(next, next()) elif token == '(': return tuple(_iter_sequence(')')) elif token == '[': return list(_iter_sequence(']')) elif token == '{': out = {} token = next() while token != '}': key = atom(next, token) next() # Skip key-value delimiter (':') token = next() out[key] = atom(next, token) token = next() if token == ',': token = next() return out raise SyntaxError('malformed expression (%r)' % token) Regards, George -- http://mail.python.org/mailman/listinfo/python-list