On Mon, 18 Jan 2010 06:23:44 -0800, Iain King wrote: > On Jan 18, 2:17 pm, Adi Eyal <a...@digitaltrowel.com> wrote: [...] >> Using regular expressions the answer is short (and sweet) >> >> mapping = { >> "foo" : "bar", >> "baz" : "quux", >> "quuux" : "foo" >> >> } >> >> pattern = "(%s)" % "|".join(mapping.keys()) >> repl = lambda x : mapping.get(x.group(1), x.group(1)) >> s = "fooxxxbazyyyquuux" >> re.subn(pattern, repl, s) > > Winner! :)
What are the rules for being declared "Winner"? For the simple case given, calling s.replace three times is much faster: more than twice as fast. But a bigger problem is that the above "winner" may not work correctly if there are conflicts between the target strings (e.g. 'a'->'X', 'aa'->'Y'). The problem is that the result you get depends on the order of the searches, BUT as given, that order is non-deterministic. dict.keys() returns in an arbitrary order, which means the caller can't specify the order except by accident. For example: >>> repl = lambda x : m[x.group(1)] >>> m = {'aa': 'Y', 'a': 'X'} >>> pattern = "(%s)" % "|".join(m.keys()) >>> subn(pattern, repl, 'aaa') # expecting 'YX' ('XXX', 3) The result that you get using this method will be consistent but arbitrary and unpredictable. For those who care, here's my timing code: from timeit import Timer setup = """ mapping = {"foo" : "bar", "baz" : "quux", "quuux" : "foo"} pattern = "(%s)" % "|".join(mapping.keys()) repl = lambda x : mapping.get(x.group(1), x.group(1)) repl = lambda x : mapping[x.group(1)] s = "fooxxxbazyyyquuux" from re import subn """ t1 = Timer("subn(pattern, repl, s)", setup) t2 = Timer( "s.replace('foo', 'bar').replace('baz', 'quux').replace('quuux', 'foo')", "s = 'fooxxxbazyyyquuux'") And the results on my PC: >>> min(t1.repeat(number=100000)) 1.1273870468139648 >>> min(t2.repeat(number=100000)) 0.49491715431213379 -- Steven -- http://mail.python.org/mailman/listinfo/python-list