> Let me show you a very bad consequence of this... > > a=open('file1.txt','rb').read() > b=re.sub('x',a,'x') > open('file2.txt','wb').write(b) > > Now if file1.txt contains a \n or \" then file2.txt is not the > same as file1.txt while it should be.
That's functioning as designed. If you want to treat file1.txt as a literal pattern for replacement, use re.escape() on it to escape things you don't want. http://docs.python.org/lib/node46.html#l2h-407 Or, you can specially treat newlines: b=re.sub('x', a.replace('\n', '\\n'), 'x') or just escape the backslashes on the incoming pattern: b=re.sub('x', a.replace('\\', '\\\\'), 'x') In the help for the RE module's syntax, this is explicitly noted: http://docs.python.org/lib/re-syntax.html """ If you're not using a raw string to express the pattern, remember that Python also uses the backslash as an escape sequence in string literals; if the escape sequence isn't recognized by Python's parser, the backslash and subsequent character are included in the resulting string. However, if Python would recognize the resulting sequence, the backslash should be repeated twice. This is complicated and hard to understand, so it's highly recommended that you use raw strings for all but the simplest expressions. """ The short upshot: "it's highly recommended that you use raw strings for all but the simplest expressions." Thus, the string that you pass as your regexp should be a regexp. Not a "python interpretation a regexp before the regex engine gets to touch it". -tkc -- http://mail.python.org/mailman/listinfo/python-list