Wilbert Berendsen wrote:
Op maandag 18 januari 2010 schreef Adi:
keys = [(len(key), key) for key in mapping.keys()]
keys.sort(reverse=True)
keys = [key for (_, key) in keys]
pattern = "(%s)" % "|".join(keys)
repl = lambda x : mapping[x.group(1)]
s = "fooxxxbazyyyquuux"
re.subn(pattern, repl, s)
I managed to make it even shorted, using the key argument for sorted, not
putting the whole regexp inside parentheses and pre-compiling the regular
expression:
import re
mapping = {
"foo" : "bar",
"baz" : "quux",
"quuux" : "foo"
}
# sort the keys, longest first, so 'aa' gets matched before 'a', because
# in Python regexps the first match (going from left to right) in a
# |-separated group is taken
keys = sorted(mapping.keys(), key=len)
For longest first you need:
keys = sorted(mapping.keys(), key=len, reverse=True)
rx = re.compile("|".join(keys))
repl = lambda x: mapping[x.group()]
s = "fooxxxbazyyyquuux"
rx.sub(repl, s)
One thing remaining: if the replacement keys could contain non-alphanumeric
characters, they should be escaped using re.escape:
Strictly speaking, not all non-alphanumeric characters, but only the
special ones.
rx = re.compile("|".join(re.escape(key) for key in keys))
--
http://mail.python.org/mailman/listinfo/python-list