On Wed, Sep 28, 2011 at 3:28 AM, Xah Lee <xah...@gmail.com> wrote: > curious question. > > suppose you have 300 different strings and they need all be replaced > to say "aaa". > > is it faster to replace each one sequentially (i.e. replace first > string to aaa, then do the 2nd, 3rd,...) > , or is it faster to use a regex with “or” them all and do replace one > shot? (i.e. "1ststr|2ndstr|3rdstr|..." -> aaa) > > let's say the sourceString this replacement to be done on is 500k > chars. > > Anyone? i suppose the answer will be similar for perl, python, ruby. > > btw, the origin of this question is about writing a emacs lisp > function that replace ~250 html named entities to unicode char.
I haven't timed it at the scale you're talking about, but for Python I expect regex will be your best bet: # Python 3.2: Supposing the match strings and replacements are # in a dict stored as `repls`... import re pattern = '|'.join(map(re.escape, repls.keys())) new_str = re.sub(pattern, lambda m: repls[m.group()], old_str) The problem with doing 300 str.replace calls is the 300 intermediate strings that would be created and then collected. Cheers, Ian -- http://mail.python.org/mailman/listinfo/python-list